Overview

Dataset statistics

Number of variables118
Number of observations28494
Missing cells2902201
Missing cells (%)86.3%
Total size in memory120.6 MiB
Average record size in memory4.3 KiB

Variable types

Categorical91
Numeric5
Unsupported12
URL10

Alerts

deviceCheckVoice has constant value ""Constant
_id has a high cardinality: 28494 distinct valuesHigh cardinality
application_id has a high cardinality: 3878 distinct valuesHigh cardinality
createdAt has a high cardinality: 28494 distinct valuesHigh cardinality
trial has a high cardinality: 2810 distinct valuesHigh cardinality
user has a high cardinality: 3835 distinct valuesHigh cardinality
voice_1.fileName has a high cardinality: 3139 distinct valuesHigh cardinality
voice_10.fileName has a high cardinality: 3277 distinct valuesHigh cardinality
voice_2.fileName has a high cardinality: 2923 distinct valuesHigh cardinality
voice_3.fileName has a high cardinality: 3061 distinct valuesHigh cardinality
voice_4.fileName has a high cardinality: 3139 distinct valuesHigh cardinality
voice_5.fileName has a high cardinality: 3061 distinct valuesHigh cardinality
voice_6.fileName has a high cardinality: 3061 distinct valuesHigh cardinality
voice_7.fileName has a high cardinality: 3277 distinct valuesHigh cardinality
voice_8.fileName has a high cardinality: 3139 distinct valuesHigh cardinality
voice_9.fileName has a high cardinality: 3277 distinct valuesHigh cardinality
englishGrammar_12 is highly imbalanced (80.5%)Imbalance
englishGrammar_13 is highly imbalanced (70.7%)Imbalance
englishGrammar_14 is highly imbalanced (60.9%)Imbalance
englishGrammar_15 is highly imbalanced (75.7%)Imbalance
englishGrammar_16 is highly imbalanced (82.2%)Imbalance
englishGrammar_17 is highly imbalanced (50.4%)Imbalance
englishGrammar_18 is highly imbalanced (83.4%)Imbalance
englishGrammar_2 is highly imbalanced (69.0%)Imbalance
englishGrammar_21 is highly imbalanced (52.2%)Imbalance
englishGrammar_24 is highly imbalanced (73.5%)Imbalance
englishGrammar_25 is highly imbalanced (92.5%)Imbalance
englishGrammar_26 is highly imbalanced (90.1%)Imbalance
englishGrammar_27 is highly imbalanced (51.9%)Imbalance
englishGrammar_3 is highly imbalanced (70.0%)Imbalance
englishGrammar_4 is highly imbalanced (63.5%)Imbalance
englishGrammar_6 is highly imbalanced (53.2%)Imbalance
englishGrammar_8 is highly imbalanced (61.7%)Imbalance
listening_1 is highly imbalanced (57.2%)Imbalance
listening_3 is highly imbalanced (67.0%)Imbalance
listening_4 is highly imbalanced (54.3%)Imbalance
listening_9 is highly imbalanced (57.6%)Imbalance
readingComprehension_2 is highly imbalanced (72.4%)Imbalance
readingComprehension_3 is highly imbalanced (70.4%)Imbalance
readingComprehension_5 is highly imbalanced (64.2%)Imbalance
situationalJudgement_1 is highly imbalanced (71.7%)Imbalance
situationalJudgement_10 is highly imbalanced (75.7%)Imbalance
situationalJudgement_11 is highly imbalanced (75.2%)Imbalance
situationalJudgement_13 is highly imbalanced (92.2%)Imbalance
situationalJudgement_14 is highly imbalanced (92.1%)Imbalance
situationalJudgement_15 is highly imbalanced (92.2%)Imbalance
situationalJudgement_2 is highly imbalanced (82.9%)Imbalance
situationalJudgement_4 is highly imbalanced (86.0%)Imbalance
situationalJudgement_5 is highly imbalanced (55.7%)Imbalance
situationalJudgement_7 is highly imbalanced (73.1%)Imbalance
situationalJudgement_9 is highly imbalanced (80.8%)Imbalance
voice_1.prompt is highly imbalanced (58.4%)Imbalance
voice_10.prompt is highly imbalanced (93.7%)Imbalance
voice_2.prompt is highly imbalanced (89.8%)Imbalance
voice_3.prompt is highly imbalanced (57.6%)Imbalance
voice_4.prompt is highly imbalanced (53.6%)Imbalance
voice_6.prompt is highly imbalanced (56.5%)Imbalance
voice_7.prompt is highly imbalanced (59.9%)Imbalance
voice_8.prompt is highly imbalanced (58.5%)Imbalance
voice_9.prompt is highly imbalanced (96.5%)Imbalance
automaticScore has 25212 (88.5%) missing valuesMissing
deviceCheckVoice has 25220 (88.5%) missing valuesMissing
englishGrammar_1 has 26183 (91.9%) missing valuesMissing
englishGrammar_10 has 26998 (94.7%) missing valuesMissing
englishGrammar_11 has 27004 (94.8%) missing valuesMissing
englishGrammar_12 has 27021 (94.8%) missing valuesMissing
englishGrammar_13 has 26986 (94.7%) missing valuesMissing
englishGrammar_14 has 27052 (94.9%) missing valuesMissing
englishGrammar_15 has 26157 (91.8%) missing valuesMissing
englishGrammar_16 has 26115 (91.7%) missing valuesMissing
englishGrammar_17 has 26156 (91.8%) missing valuesMissing
englishGrammar_18 has 26130 (91.7%) missing valuesMissing
englishGrammar_19 has 26585 (93.3%) missing valuesMissing
englishGrammar_2 has 26136 (91.7%) missing valuesMissing
englishGrammar_20 has 26581 (93.3%) missing valuesMissing
englishGrammar_21 has 26595 (93.3%) missing valuesMissing
englishGrammar_22 has 26624 (93.4%) missing valuesMissing
englishGrammar_23 has 27028 (94.9%) missing valuesMissing
englishGrammar_24 has 26993 (94.7%) missing valuesMissing
englishGrammar_25 has 27031 (94.9%) missing valuesMissing
englishGrammar_26 has 27060 (95.0%) missing valuesMissing
englishGrammar_27 has 27097 (95.1%) missing valuesMissing
englishGrammar_28 has 27049 (94.9%) missing valuesMissing
englishGrammar_3 has 26059 (91.5%) missing valuesMissing
englishGrammar_4 has 26084 (91.5%) missing valuesMissing
englishGrammar_5 has 26666 (93.6%) missing valuesMissing
englishGrammar_6 has 26676 (93.6%) missing valuesMissing
englishGrammar_7 has 26646 (93.5%) missing valuesMissing
englishGrammar_8 has 26600 (93.4%) missing valuesMissing
englishGrammar_9 has 27001 (94.8%) missing valuesMissing
final.accuracy has 25449 (89.3%) missing valuesMissing
final.wpm has 25449 (89.3%) missing valuesMissing
listening_1 has 25236 (88.6%) missing valuesMissing
listening_10 has 25222 (88.5%) missing valuesMissing
listening_2 has 25222 (88.5%) missing valuesMissing
listening_3 has 25240 (88.6%) missing valuesMissing
listening_4 has 25219 (88.5%) missing valuesMissing
listening_5 has 25238 (88.6%) missing valuesMissing
listening_6 has 25215 (88.5%) missing valuesMissing
listening_7 has 25210 (88.5%) missing valuesMissing
listening_8 has 25225 (88.5%) missing valuesMissing
listening_9 has 25213 (88.5%) missing valuesMissing
percent has 11232 (39.4%) missing valuesMissing
readingComprehension_1 has 25001 (87.7%) missing valuesMissing
readingComprehension_10 has 25140 (88.2%) missing valuesMissing
readingComprehension_11 has 25377 (89.1%) missing valuesMissing
readingComprehension_2 has 25023 (87.8%) missing valuesMissing
readingComprehension_3 has 25000 (87.7%) missing valuesMissing
readingComprehension_4 has 25007 (87.8%) missing valuesMissing
readingComprehension_5 has 25012 (87.8%) missing valuesMissing
readingComprehension_6 has 25018 (87.8%) missing valuesMissing
readingComprehension_7 has 25000 (87.7%) missing valuesMissing
readingComprehension_8 has 25030 (87.8%) missing valuesMissing
readingComprehension_9 has 25014 (87.8%) missing valuesMissing
score has 11229 (39.4%) missing valuesMissing
scoreBreakdown.pickIncorrect has 28494 (100.0%) missing valuesMissing
scoreBreakdown.tenses has 28494 (100.0%) missing valuesMissing
scoreBreakdown.wordSelection has 28494 (100.0%) missing valuesMissing
situationalJudgement_1 has 25837 (90.7%) missing valuesMissing
situationalJudgement_10 has 26378 (92.6%) missing valuesMissing
situationalJudgement_11 has 25957 (91.1%) missing valuesMissing
situationalJudgement_12 has 26433 (92.8%) missing valuesMissing
situationalJudgement_13 has 25968 (91.1%) missing valuesMissing
situationalJudgement_14 has 25826 (90.6%) missing valuesMissing
situationalJudgement_15 has 26387 (92.6%) missing valuesMissing
situationalJudgement_2 has 26363 (92.5%) missing valuesMissing
situationalJudgement_3 has 26420 (92.7%) missing valuesMissing
situationalJudgement_4 has 26402 (92.7%) missing valuesMissing
situationalJudgement_5 has 26382 (92.6%) missing valuesMissing
situationalJudgement_6 has 26395 (92.6%) missing valuesMissing
situationalJudgement_7 has 26382 (92.6%) missing valuesMissing
situationalJudgement_8 has 26406 (92.7%) missing valuesMissing
situationalJudgement_9 has 25820 (90.6%) missing valuesMissing
total has 11229 (39.4%) missing valuesMissing
trial has 25415 (89.2%) missing valuesMissing
voice_1.GCSData has 28494 (100.0%) missing valuesMissing
voice_1.audioUrl has 25369 (89.0%) missing valuesMissing
voice_1.fileName has 25350 (89.0%) missing valuesMissing
voice_1.prompt has 25350 (89.0%) missing valuesMissing
voice_10.audioUrl has 25237 (88.6%) missing valuesMissing
voice_10.fileName has 25212 (88.5%) missing valuesMissing
voice_10.prompt has 25212 (88.5%) missing valuesMissing
voice_2.GCSData has 28494 (100.0%) missing valuesMissing
voice_2.audioUrl has 25572 (89.7%) missing valuesMissing
voice_2.fileName has 25567 (89.7%) missing valuesMissing
voice_2.prompt has 25567 (89.7%) missing valuesMissing
voice_3.GCSData has 28494 (100.0%) missing valuesMissing
voice_3.audioUrl has 25438 (89.3%) missing valuesMissing
voice_3.fileName has 25429 (89.2%) missing valuesMissing
voice_3.prompt has 25429 (89.2%) missing valuesMissing
voice_4.GCSData has 28494 (100.0%) missing valuesMissing
voice_4.audioUrl has 25369 (89.0%) missing valuesMissing
voice_4.fileName has 25350 (89.0%) missing valuesMissing
voice_4.prompt has 25350 (89.0%) missing valuesMissing
voice_5.GCSData has 28494 (100.0%) missing valuesMissing
voice_5.audioUrl has 25439 (89.3%) missing valuesMissing
voice_5.fileName has 25429 (89.2%) missing valuesMissing
voice_5.prompt has 25429 (89.2%) missing valuesMissing
voice_6.GCSData has 28494 (100.0%) missing valuesMissing
voice_6.audioUrl has 25438 (89.3%) missing valuesMissing
voice_6.fileName has 25429 (89.2%) missing valuesMissing
voice_6.prompt has 25429 (89.2%) missing valuesMissing
voice_7.GCSData has 28494 (100.0%) missing valuesMissing
voice_7.audioUrl has 25237 (88.6%) missing valuesMissing
voice_7.fileName has 25212 (88.5%) missing valuesMissing
voice_7.prompt has 25212 (88.5%) missing valuesMissing
voice_8.GCSData has 28494 (100.0%) missing valuesMissing
voice_8.audioUrl has 25369 (89.0%) missing valuesMissing
voice_8.fileName has 25350 (89.0%) missing valuesMissing
voice_8.prompt has 25350 (89.0%) missing valuesMissing
voice_9.audioUrl has 25236 (88.6%) missing valuesMissing
voice_9.fileName has 25212 (88.5%) missing valuesMissing
voice_9.prompt has 25212 (88.5%) missing valuesMissing
_id is uniformly distributedUniform
application_id is uniformly distributedUniform
createdAt is uniformly distributedUniform
trial is uniformly distributedUniform
user is uniformly distributedUniform
voice_1.fileName is uniformly distributedUniform
voice_10.fileName is uniformly distributedUniform
voice_2.fileName is uniformly distributedUniform
voice_3.fileName is uniformly distributedUniform
voice_4.fileName is uniformly distributedUniform
voice_5.fileName is uniformly distributedUniform
voice_6.fileName is uniformly distributedUniform
voice_7.fileName is uniformly distributedUniform
voice_8.fileName is uniformly distributedUniform
voice_9.fileName is uniformly distributedUniform
_id has unique valuesUnique
createdAt has unique valuesUnique
percent is an unsupported type, check if it needs cleaning or further analysisUnsupported
scoreBreakdown.pickIncorrect is an unsupported type, check if it needs cleaning or further analysisUnsupported
scoreBreakdown.tenses is an unsupported type, check if it needs cleaning or further analysisUnsupported
scoreBreakdown.wordSelection is an unsupported type, check if it needs cleaning or further analysisUnsupported
voice_1.GCSData is an unsupported type, check if it needs cleaning or further analysisUnsupported
voice_2.GCSData is an unsupported type, check if it needs cleaning or further analysisUnsupported
voice_3.GCSData is an unsupported type, check if it needs cleaning or further analysisUnsupported
voice_4.GCSData is an unsupported type, check if it needs cleaning or further analysisUnsupported
voice_5.GCSData is an unsupported type, check if it needs cleaning or further analysisUnsupported
voice_6.GCSData is an unsupported type, check if it needs cleaning or further analysisUnsupported
voice_7.GCSData is an unsupported type, check if it needs cleaning or further analysisUnsupported
voice_8.GCSData is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2023-02-08 10:40:50.928877
Analysis finished2023-02-08 10:41:28.020829
Duration37.09 seconds
Software versionydata-profiling v0.0.dev0
Download configurationconfig.json

Variables

_id
Categorical

HIGH CARDINALITY  UNIFORM  UNIQUE 

Distinct28494
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size2.0 MiB
 
1
 
1
 
1
 
1
 
1
28489 

Length

Max length17
Median length17
Mean length17
Min length17

Characters and Unicode

Total characters484398
Distinct characters55
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique28494 ?
Unique (%)100.0%

Common Values

ValueCountFrequency (%)
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
Other values (28484) 28484
> 99.9%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
Other values (28484) 28484
> 99.9%

Most occurring characters

ValueCountFrequency (%)
9009
 
1.9%
8980
 
1.9%
8977
 
1.9%
8975
 
1.9%
8939
 
1.8%
8920
 
1.8%
8910
 
1.8%
8900
 
1.8%
8900
 
1.8%
8898
 
1.8%
Other values (45) 394990
81.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 220551
45.5%
Uppercase Letter 193595
40.0%
Decimal Number 70252
 
14.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
9009
 
4.1%
8977
 
4.1%
8975
 
4.1%
8939
 
4.1%
8920
 
4.0%
8900
 
4.0%
8900
 
4.0%
8898
 
4.0%
8834
 
4.0%
8821
 
4.0%
Other values (15) 131378
59.6%
Uppercase Letter
ValueCountFrequency (%)
8980
 
4.6%
8910
 
4.6%
8890
 
4.6%
8888
 
4.6%
8881
 
4.6%
8856
 
4.6%
8845
 
4.6%
8838
 
4.6%
8818
 
4.6%
8812
 
4.6%
Other values (12) 104877
54.2%
Decimal Number
ValueCountFrequency (%)
8877
12.6%
8825
12.6%
8823
12.6%
8787
12.5%
8764
12.5%
8755
12.5%
8729
12.4%
8692
12.4%

Most occurring scripts

ValueCountFrequency (%)
Latin 414146
85.5%
Common 70252
 
14.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
9009
 
2.2%
8980
 
2.2%
8977
 
2.2%
8975
 
2.2%
8939
 
2.2%
8920
 
2.2%
8910
 
2.2%
8900
 
2.1%
8900
 
2.1%
8898
 
2.1%
Other values (37) 324738
78.4%
Common
ValueCountFrequency (%)
8877
12.6%
8825
12.6%
8823
12.6%
8787
12.5%
8764
12.5%
8755
12.5%
8729
12.4%
8692
12.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 484398
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
9009
 
1.9%
8980
 
1.9%
8977
 
1.9%
8975
 
1.9%
8939
 
1.8%
8920
 
1.8%
8910
 
1.8%
8900
 
1.8%
8900
 
1.8%
8898
 
1.8%
Other values (45) 394990
81.5%

application_id
Categorical

HIGH CARDINALITY  UNIFORM 

Distinct3878
Distinct (%)13.6%
Missing0
Missing (%)0.0%
Memory size2.0 MiB
 
11
 
11
 
11
 
11
 
11
28439 

Length

Max length17
Median length17
Mean length17
Min length17

Characters and Unicode

Total characters484398
Distinct characters55
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique85 ?
Unique (%)0.3%

Common Values

ValueCountFrequency (%)
11
 
< 0.1%
11
 
< 0.1%
11
 
< 0.1%
11
 
< 0.1%
11
 
< 0.1%
11
 
< 0.1%
11
 
< 0.1%
11
 
< 0.1%
11
 
< 0.1%
11
 
< 0.1%
Other values (3868) 28384
99.6%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
11
 
< 0.1%
11
 
< 0.1%
11
 
< 0.1%
11
 
< 0.1%
11
 
< 0.1%
11
 
< 0.1%
11
 
< 0.1%
11
 
< 0.1%
11
 
< 0.1%
11
 
< 0.1%
Other values (3868) 28384
99.6%

Most occurring characters

ValueCountFrequency (%)
9496
 
2.0%
9405
 
1.9%
9361
 
1.9%
9323
 
1.9%
9292
 
1.9%
9134
 
1.9%
9121
 
1.9%
9090
 
1.9%
9060
 
1.9%
9041
 
1.9%
Other values (45) 392075
80.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 220382
45.5%
Uppercase Letter 194136
40.1%
Decimal Number 69880
 
14.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
9405
 
4.3%
9323
 
4.2%
9292
 
4.2%
9134
 
4.1%
9121
 
4.1%
9060
 
4.1%
9031
 
4.1%
8971
 
4.1%
8965
 
4.1%
8947
 
4.1%
Other values (15) 129133
58.6%
Uppercase Letter
ValueCountFrequency (%)
9496
 
4.9%
9361
 
4.8%
9041
 
4.7%
9018
 
4.6%
8969
 
4.6%
8938
 
4.6%
8898
 
4.6%
8894
 
4.6%
8863
 
4.6%
8843
 
4.6%
Other values (12) 103815
53.5%
Decimal Number
ValueCountFrequency (%)
9090
13.0%
9030
12.9%
8759
12.5%
8737
12.5%
8714
12.5%
8542
12.2%
8532
12.2%
8476
12.1%

Most occurring scripts

ValueCountFrequency (%)
Latin 414518
85.6%
Common 69880
 
14.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
9496
 
2.3%
9405
 
2.3%
9361
 
2.3%
9323
 
2.2%
9292
 
2.2%
9134
 
2.2%
9121
 
2.2%
9060
 
2.2%
9041
 
2.2%
9031
 
2.2%
Other values (37) 322254
77.7%
Common
ValueCountFrequency (%)
9090
13.0%
9030
12.9%
8759
12.5%
8737
12.5%
8714
12.5%
8542
12.2%
8532
12.2%
8476
12.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 484398
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
9496
 
2.0%
9405
 
1.9%
9361
 
1.9%
9323
 
1.9%
9292
 
1.9%
9134
 
1.9%
9121
 
1.9%
9090
 
1.9%
9060
 
1.9%
9041
 
1.9%
Other values (45) 392075
80.9%

automaticScore
Real number (ℝ)

Distinct3062
Distinct (%)93.3%
Missing25212
Missing (%)88.5%
Infinite0
Infinite (%)0.0%
Mean69.718207
Minimum0
Maximum96.824233
Zeros221
Zeros (%)0.8%
Negative0
Negative (%)0.0%
Memory size222.7 KiB

Quantile statistics

Minimum0
5-th percentile0
Q158.322642
median82.007264
Q390.090525
95-th percentile93.95771
Maximum96.824233
Range96.824233
Interquartile range (IQR)31.767883

Descriptive statistics

Standard deviation27.885505
Coefficient of variation (CV)0.3999745
Kurtosis0.62540064
Mean69.718207
Median Absolute Deviation (MAD)9.9783996
Skewness-1.3243327
Sum228815.16
Variance777.60141
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 221
 
0.8%
86.5555387 1
 
< 0.1%
81.25676641 1
 
< 0.1%
74.67100702 1
 
< 0.1%
75.91230446 1
 
< 0.1%
39.76567487 1
 
< 0.1%
51.4256542 1
 
< 0.1%
60.29269483 1
 
< 0.1%
75.12970533 1
 
< 0.1%
74.38011456 1
 
< 0.1%
Other values (3052) 3052
 
10.7%
(Missing) 25212
88.5%
ValueCountFrequency (%)
0 221
0.8%
0.1521550594 1
 
< 0.1%
0.1872758567 1
 
< 0.1%
0.2181366425 1
 
< 0.1%
0.2794063445 1
 
< 0.1%
0.3978414237 1
 
< 0.1%
0.6250090081 1
 
< 0.1%
0.8566477636 1
 
< 0.1%
1.308530501 1
 
< 0.1%
1.438634913 1
 
< 0.1%
ValueCountFrequency (%)
96.82423253 1
< 0.1%
96.61722445 1
< 0.1%
96.50841309 1
< 0.1%
96.46214604 1
< 0.1%
96.4543435 1
< 0.1%
96.31357894 1
< 0.1%
96.30015264 1
< 0.1%
96.28455193 1
< 0.1%
96.25887704 1
< 0.1%
96.13473984 1
< 0.1%

createdAt
Categorical

HIGH CARDINALITY  UNIFORM  UNIQUE 

Distinct28494
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size2.2 MiB
 
1
 
1
 
1
 
1
 
1
28489 

Length

Max length24
Median length24
Mean length24
Min length24

Characters and Unicode

Total characters683856
Distinct characters15
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique28494 ?
Unique (%)100.0%

Common Values

ValueCountFrequency (%)
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
Other values (28484) 28484
> 99.9%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
Other values (28484) 28484
> 99.9%

Most occurring characters

ValueCountFrequency (%)
122619
17.9%
107625
15.7%
64356
9.4%
56988
8.3%
56988
8.3%
38263
 
5.6%
29704
 
4.3%
28494
 
4.2%
28494
 
4.2%
28494
 
4.2%
Other values (5) 121831
17.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 484398
70.8%
Other Punctuation 85482
 
12.5%
Dash Punctuation 56988
 
8.3%
Uppercase Letter 56988
 
8.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
122619
25.3%
107625
22.2%
64356
13.3%
38263
 
7.9%
29704
 
6.1%
28237
 
5.8%
24301
 
5.0%
23787
 
4.9%
23291
 
4.8%
22215
 
4.6%
Other Punctuation
ValueCountFrequency (%)
56988
66.7%
28494
33.3%
Uppercase Letter
ValueCountFrequency (%)
28494
50.0%
28494
50.0%
Dash Punctuation
ValueCountFrequency (%)
56988
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 626868
91.7%
Latin 56988
 
8.3%

Most frequent character per script

Common
ValueCountFrequency (%)
122619
19.6%
107625
17.2%
64356
10.3%
56988
9.1%
56988
9.1%
38263
 
6.1%
29704
 
4.7%
28494
 
4.5%
28237
 
4.5%
24301
 
3.9%
Other values (3) 69293
11.1%
Latin
ValueCountFrequency (%)
28494
50.0%
28494
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 683856
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
122619
17.9%
107625
15.7%
64356
9.4%
56988
8.3%
56988
8.3%
38263
 
5.6%
29704
 
4.3%
28494
 
4.2%
28494
 
4.2%
28494
 
4.2%
Other values (5) 121831
17.8%

deviceCheckVoice
Categorical

CONSTANT  MISSING 

Distinct1
Distinct (%)< 0.1%
Missing25220
Missing (%)88.5%
Memory size1012.1 KiB
3274 

Length

Max length13
Median length13
Mean length13
Min length13

Characters and Unicode

Total characters42562
Distinct characters9
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
3274
 
11.5%
(Missing) 25220
88.5%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
3274
100.0%

Most occurring characters

ValueCountFrequency (%)
9822
23.1%
6548
15.4%
6548
15.4%
3274
 
7.7%
3274
 
7.7%
3274
 
7.7%
3274
 
7.7%
3274
 
7.7%
3274
 
7.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 39288
92.3%
Uppercase Letter 3274
 
7.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
9822
25.0%
6548
16.7%
6548
16.7%
3274
 
8.3%
3274
 
8.3%
3274
 
8.3%
3274
 
8.3%
3274
 
8.3%
Uppercase Letter
ValueCountFrequency (%)
3274
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 42562
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
9822
23.1%
6548
15.4%
6548
15.4%
3274
 
7.7%
3274
 
7.7%
3274
 
7.7%
3274
 
7.7%
3274
 
7.7%
3274
 
7.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 42562
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
9822
23.1%
6548
15.4%
6548
15.4%
3274
 
7.7%
3274
 
7.7%
3274
 
7.7%
3274
 
7.7%
3274
 
7.7%
3274
 
7.7%

englishGrammar_1
Categorical

Distinct4
Distinct (%)0.2%
Missing26183
Missing (%)91.9%
Memory size975.3 KiB
1504 
567 
222 
 
18

Length

Max length13
Median length13
Mean length12.539161
Min length10

Characters and Unicode

Total characters28978
Distinct characters12
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
1504
 
5.3%
567
 
2.0%
222
 
0.8%
18
 
0.1%
(Missing) 26183
91.9%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
1726
37.3%
1504
32.5%
585
 
12.7%
567
 
12.3%
222
 
4.8%
18
 
0.4%

Most occurring characters

ValueCountFrequency (%)
6348
21.9%
5541
19.1%
2311
 
8.0%
2311
 
8.0%
2311
 
8.0%
2311
 
8.0%
2089
 
7.2%
2071
 
7.1%
1726
 
6.0%
807
 
2.8%
Other values (2) 1152
 
4.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 24578
84.8%
Space Separator 2311
 
8.0%
Other Punctuation 2089
 
7.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
6348
25.8%
5541
22.5%
2311
 
9.4%
2311
 
9.4%
2311
 
9.4%
2071
 
8.4%
1726
 
7.0%
807
 
3.3%
585
 
2.4%
567
 
2.3%
Space Separator
ValueCountFrequency (%)
2311
100.0%
Other Punctuation
ValueCountFrequency (%)
2089
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 24578
84.8%
Common 4400
 
15.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
6348
25.8%
5541
22.5%
2311
 
9.4%
2311
 
9.4%
2311
 
9.4%
2071
 
8.4%
1726
 
7.0%
807
 
3.3%
585
 
2.4%
567
 
2.3%
Common
ValueCountFrequency (%)
2311
52.5%
2089
47.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 28978
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
6348
21.9%
5541
19.1%
2311
 
8.0%
2311
 
8.0%
2311
 
8.0%
2311
 
8.0%
2089
 
7.2%
2071
 
7.1%
1726
 
6.0%
807
 
2.8%
Other values (2) 1152
 
4.0%
Distinct4
Distinct (%)0.3%
Missing26998
Missing (%)94.7%
Memory size932.7 KiB
1159 
241 
 
70
 
26

Length

Max length4
Median length4
Mean length3.8389037
Min length3

Characters and Unicode

Total characters5743
Distinct characters12
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
1159
 
4.1%
241
 
0.8%
70
 
0.2%
26
 
0.1%
(Missing) 26998
94.7%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
1159
77.5%
241
 
16.1%
70
 
4.7%
26
 
1.7%

Most occurring characters

ValueCountFrequency (%)
1400
24.4%
1159
20.2%
1159
20.2%
1159
20.2%
241
 
4.2%
241
 
4.2%
140
 
2.4%
70
 
1.2%
70
 
1.2%
52
 
0.9%
Other values (2) 52
 
0.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 5743
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
1400
24.4%
1159
20.2%
1159
20.2%
1159
20.2%
241
 
4.2%
241
 
4.2%
140
 
2.4%
70
 
1.2%
70
 
1.2%
52
 
0.9%
Other values (2) 52
 
0.9%

Most occurring scripts

ValueCountFrequency (%)
Latin 5743
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
1400
24.4%
1159
20.2%
1159
20.2%
1159
20.2%
241
 
4.2%
241
 
4.2%
140
 
2.4%
70
 
1.2%
70
 
1.2%
52
 
0.9%
Other values (2) 52
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5743
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1400
24.4%
1159
20.2%
1159
20.2%
1159
20.2%
241
 
4.2%
241
 
4.2%
140
 
2.4%
70
 
1.2%
70
 
1.2%
52
 
0.9%
Other values (2) 52
 
0.9%
Distinct4
Distinct (%)0.3%
Missing27004
Missing (%)94.8%
Memory size932.6 KiB
1157 
194 
 
82
 
57

Length

Max length4
Median length4
Mean length3.9234899
Min length2

Characters and Unicode

Total characters5846
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
1157
 
4.1%
194
 
0.7%
82
 
0.3%
57
 
0.2%
(Missing) 27004
94.8%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
1157
77.7%
194
 
13.0%
82
 
5.5%
57
 
3.8%

Most occurring characters

ValueCountFrequency (%)
1433
24.5%
1239
21.2%
1239
21.2%
1157
19.8%
194
 
3.3%
194
 
3.3%
194
 
3.3%
82
 
1.4%
57
 
1.0%
57
 
1.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 5846
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
1433
24.5%
1239
21.2%
1239
21.2%
1157
19.8%
194
 
3.3%
194
 
3.3%
194
 
3.3%
82
 
1.4%
57
 
1.0%
57
 
1.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 5846
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
1433
24.5%
1239
21.2%
1239
21.2%
1157
19.8%
194
 
3.3%
194
 
3.3%
194
 
3.3%
82
 
1.4%
57
 
1.0%
57
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5846
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1433
24.5%
1239
21.2%
1239
21.2%
1157
19.8%
194
 
3.3%
194
 
3.3%
194
 
3.3%
82
 
1.4%
57
 
1.0%
57
 
1.0%

englishGrammar_12
Categorical

IMBALANCE  MISSING 

Distinct4
Distinct (%)0.3%
Missing27021
Missing (%)94.8%
Memory size938.1 KiB
1393 
 
30
 
28
 
22

Length

Max length9
Median length8
Mean length8.0353021
Min length8

Characters and Unicode

Total characters11836
Distinct characters16
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
1393
 
4.9%
30
 
0.1%
28
 
0.1%
22
 
0.1%
(Missing) 27021
94.8%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
1393
94.6%
30
 
2.0%
28
 
1.9%
22
 
1.5%

Most occurring characters

ValueCountFrequency (%)
2808
23.7%
2808
23.7%
1471
12.4%
1465
12.4%
1415
12.0%
1415
12.0%
110
 
0.9%
60
 
0.5%
56
 
0.5%
52
 
0.4%
Other values (6) 176
 
1.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 11836
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
2808
23.7%
2808
23.7%
1471
12.4%
1465
12.4%
1415
12.0%
1415
12.0%
110
 
0.9%
60
 
0.5%
56
 
0.5%
52
 
0.4%
Other values (6) 176
 
1.5%

Most occurring scripts

ValueCountFrequency (%)
Latin 11836
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
2808
23.7%
2808
23.7%
1471
12.4%
1465
12.4%
1415
12.0%
1415
12.0%
110
 
0.9%
60
 
0.5%
56
 
0.5%
52
 
0.4%
Other values (6) 176
 
1.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 11836
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2808
23.7%
2808
23.7%
1471
12.4%
1465
12.4%
1415
12.0%
1415
12.0%
110
 
0.9%
60
 
0.5%
56
 
0.5%
52
 
0.4%
Other values (6) 176
 
1.5%

englishGrammar_13
Categorical

IMBALANCE  MISSING 

Distinct4
Distinct (%)0.3%
Missing26986
Missing (%)94.7%
Memory size940.6 KiB
1364 
 
81
 
45
 
18

Length

Max length10
Median length9
Mean length9.0059682
Min length7

Characters and Unicode

Total characters13581
Distinct characters12
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
1364
 
4.8%
81
 
0.3%
45
 
0.2%
18
 
0.1%
(Missing) 26986
94.7%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
1508
50.0%
1364
45.2%
81
 
2.7%
45
 
1.5%
18
 
0.6%

Most occurring characters

ValueCountFrequency (%)
2728
20.1%
1652
12.2%
1508
11.1%
1508
11.1%
1508
11.1%
1508
11.1%
1445
10.6%
1364
10.0%
99
 
0.7%
99
 
0.7%
Other values (2) 162
 
1.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 10466
77.1%
Other Punctuation 1607
 
11.8%
Space Separator 1508
 
11.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
2728
26.1%
1652
15.8%
1508
14.4%
1508
14.4%
1445
13.8%
1364
13.0%
99
 
0.9%
99
 
0.9%
63
 
0.6%
Other Punctuation
ValueCountFrequency (%)
1508
93.8%
99
 
6.2%
Space Separator
ValueCountFrequency (%)
1508
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 10466
77.1%
Common 3115
 
22.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
2728
26.1%
1652
15.8%
1508
14.4%
1508
14.4%
1445
13.8%
1364
13.0%
99
 
0.9%
99
 
0.9%
63
 
0.6%
Common
ValueCountFrequency (%)
1508
48.4%
1508
48.4%
99
 
3.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 13581
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2728
20.1%
1652
12.2%
1508
11.1%
1508
11.1%
1508
11.1%
1508
11.1%
1445
10.6%
1364
10.0%
99
 
0.7%
99
 
0.7%
Other values (2) 162
 
1.2%

englishGrammar_14
Categorical

IMBALANCE  MISSING 

Distinct4
Distinct (%)0.3%
Missing27052
Missing (%)94.9%
Memory size930.5 KiB
1194 
214 
 
23
 
11

Length

Max length6
Median length3
Mean length3.3522885
Min length3

Characters and Unicode

Total characters4834
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
1194
 
4.2%
214
 
0.8%
23
 
0.1%
11
 
< 0.1%
(Missing) 27052
94.9%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
1194
82.8%
214
 
14.8%
23
 
1.6%
11
 
0.8%

Most occurring characters

ValueCountFrequency (%)
1217
25.2%
1217
25.2%
1194
24.7%
260
 
5.4%
237
 
4.9%
237
 
4.9%
225
 
4.7%
214
 
4.4%
22
 
0.5%
11
 
0.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 4834
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
1217
25.2%
1217
25.2%
1194
24.7%
260
 
5.4%
237
 
4.9%
237
 
4.9%
225
 
4.7%
214
 
4.4%
22
 
0.5%
11
 
0.2%

Most occurring scripts

ValueCountFrequency (%)
Latin 4834
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
1217
25.2%
1217
25.2%
1194
24.7%
260
 
5.4%
237
 
4.9%
237
 
4.9%
225
 
4.7%
214
 
4.4%
22
 
0.5%
11
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4834
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1217
25.2%
1217
25.2%
1194
24.7%
260
 
5.4%
237
 
4.9%
237
 
4.9%
225
 
4.7%
214
 
4.4%
22
 
0.5%
11
 
0.2%

englishGrammar_15
Categorical

IMBALANCE  MISSING 

Distinct4
Distinct (%)0.2%
Missing26157
Missing (%)91.8%
Memory size972.4 KiB
2167 
 
79
 
54
 
37

Length

Max length12
Median length11
Mean length10.845101
Min length8

Characters and Unicode

Total characters25345
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
2167
 
7.6%
79
 
0.3%
54
 
0.2%
37
 
0.1%
(Missing) 26157
91.8%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
2283
49.7%
2221
48.3%
54
 
1.2%
37
 
0.8%

Most occurring characters

ValueCountFrequency (%)
4558
18.0%
4504
17.8%
2337
9.2%
2337
9.2%
2337
9.2%
2337
9.2%
2283
9.0%
2283
9.0%
2258
8.9%
37
 
0.1%
Other values (2) 74
 
0.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 23087
91.1%
Space Separator 2258
 
8.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
4558
19.7%
4504
19.5%
2337
10.1%
2337
10.1%
2337
10.1%
2337
10.1%
2283
9.9%
2283
9.9%
37
 
0.2%
37
 
0.2%
Space Separator
ValueCountFrequency (%)
2258
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 23087
91.1%
Common 2258
 
8.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
4558
19.7%
4504
19.5%
2337
10.1%
2337
10.1%
2337
10.1%
2337
10.1%
2283
9.9%
2283
9.9%
37
 
0.2%
37
 
0.2%
Common
ValueCountFrequency (%)
2258
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 25345
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
4558
18.0%
4504
17.8%
2337
9.2%
2337
9.2%
2337
9.2%
2337
9.2%
2283
9.0%
2283
9.0%
2258
8.9%
37
 
0.1%
Other values (2) 74
 
0.3%

englishGrammar_16
Categorical

IMBALANCE  MISSING 

Distinct4
Distinct (%)0.2%
Missing26115
Missing (%)91.7%
Memory size958.2 KiB
2260 
 
67
 
36
 
16

Length

Max length11
Median length4
Mean length4.1185372
Min length4

Characters and Unicode

Total characters9798
Distinct characters10
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
2260
 
7.9%
67
 
0.2%
36
 
0.1%
16
 
0.1%
(Missing) 26115
91.7%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
2260
94.4%
83
 
3.5%
36
 
1.5%
16
 
0.7%

Most occurring characters

ValueCountFrequency (%)
2462
25.1%
2379
24.3%
2379
24.3%
2379
24.3%
83
 
0.8%
36
 
0.4%
32
 
0.3%
16
 
0.2%
16
 
0.2%
16
 
0.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 9782
99.8%
Space Separator 16
 
0.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
2462
25.2%
2379
24.3%
2379
24.3%
2379
24.3%
83
 
0.8%
36
 
0.4%
32
 
0.3%
16
 
0.2%
16
 
0.2%
Space Separator
ValueCountFrequency (%)
16
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 9782
99.8%
Common 16
 
0.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
2462
25.2%
2379
24.3%
2379
24.3%
2379
24.3%
83
 
0.8%
36
 
0.4%
32
 
0.3%
16
 
0.2%
16
 
0.2%
Common
ValueCountFrequency (%)
16
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 9798
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2462
25.1%
2379
24.3%
2379
24.3%
2379
24.3%
83
 
0.8%
36
 
0.4%
32
 
0.3%
16
 
0.2%
16
 
0.2%
16
 
0.2%

englishGrammar_17
Categorical

IMBALANCE  MISSING 

Distinct4
Distinct (%)0.2%
Missing26156
Missing (%)91.8%
Memory size968.4 KiB
1633 
664 
 
34
 
7

Length

Max length10
Median length10
Mean length9.0804106
Min length2

Characters and Unicode

Total characters21230
Distinct characters6
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
1633
 
5.7%
664
 
2.3%
34
 
0.1%
7
 
< 0.1%
(Missing) 26156
91.8%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
3971
63.0%
2331
37.0%

Most occurring characters

ValueCountFrequency (%)
4662
22.0%
3971
18.7%
3971
18.7%
3964
18.7%
2331
11.0%
2331
11.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 17266
81.3%
Space Separator 3964
 
18.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
4662
27.0%
3971
23.0%
3971
23.0%
2331
13.5%
2331
13.5%
Space Separator
ValueCountFrequency (%)
3964
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 17266
81.3%
Common 3964
 
18.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
4662
27.0%
3971
23.0%
3971
23.0%
2331
13.5%
2331
13.5%
Common
ValueCountFrequency (%)
3964
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 21230
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
4662
22.0%
3971
18.7%
3971
18.7%
3964
18.7%
2331
11.0%
2331
11.0%

englishGrammar_18
Categorical

IMBALANCE  MISSING 

Distinct4
Distinct (%)0.2%
Missing26130
Missing (%)91.7%
Memory size987.3 KiB
2251 
 
80
 
22
 
11

Length

Max length20
Median length17
Mean length16.918359
Min length8

Characters and Unicode

Total characters39995
Distinct characters13
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
2251
 
7.9%
80
 
0.3%
22
 
0.1%
11
 
< 0.1%
(Missing) 26130
91.7%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
2273
47.9%
2273
47.9%
91
 
1.9%
80
 
1.7%
33
 
0.7%

Most occurring characters

ValueCountFrequency (%)
6990
17.5%
6979
17.4%
4717
11.8%
2386
 
6.0%
2386
 
6.0%
2386
 
6.0%
2364
 
5.9%
2364
 
5.9%
2364
 
5.9%
2353
 
5.9%
Other values (3) 4706
11.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 37609
94.0%
Space Separator 2386
 
6.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
6990
18.6%
6979
18.6%
4717
12.5%
2386
 
6.3%
2386
 
6.3%
2364
 
6.3%
2364
 
6.3%
2364
 
6.3%
2353
 
6.3%
2353
 
6.3%
Other values (2) 2353
 
6.3%
Space Separator
ValueCountFrequency (%)
2386
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 37609
94.0%
Common 2386
 
6.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
6990
18.6%
6979
18.6%
4717
12.5%
2386
 
6.3%
2386
 
6.3%
2364
 
6.3%
2364
 
6.3%
2364
 
6.3%
2353
 
6.3%
2353
 
6.3%
Other values (2) 2353
 
6.3%
Common
ValueCountFrequency (%)
2386
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 39995
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
6990
17.5%
6979
17.4%
4717
11.8%
2386
 
6.0%
2386
 
6.0%
2386
 
6.0%
2364
 
5.9%
2364
 
5.9%
2364
 
5.9%
2353
 
5.9%
Other values (3) 4706
11.8%
Distinct4
Distinct (%)0.2%
Missing26585
Missing (%)93.3%
Memory size942.2 KiB
1440 
165 
163 
 
141

Length

Max length8
Median length2
Mean length2.7014144
Min length2

Characters and Unicode

Total characters5157
Distinct characters12
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
1440
 
5.1%
165
 
0.6%
163
 
0.6%
141
 
0.5%
(Missing) 26585
93.3%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
1440
75.4%
165
 
8.6%
163
 
8.5%
141
 
7.4%

Most occurring characters

ValueCountFrequency (%)
1603
31.1%
1581
30.7%
610
 
11.8%
165
 
3.2%
165
 
3.2%
165
 
3.2%
163
 
3.2%
141
 
2.7%
141
 
2.7%
141
 
2.7%
Other values (2) 282
 
5.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3389
65.7%
Uppercase Letter 1768
34.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
1581
46.7%
610
 
18.0%
165
 
4.9%
165
 
4.9%
163
 
4.8%
141
 
4.2%
141
 
4.2%
141
 
4.2%
141
 
4.2%
141
 
4.2%
Uppercase Letter
ValueCountFrequency (%)
1603
90.7%
165
 
9.3%

Most occurring scripts

ValueCountFrequency (%)
Latin 5157
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
1603
31.1%
1581
30.7%
610
 
11.8%
165
 
3.2%
165
 
3.2%
165
 
3.2%
163
 
3.2%
141
 
2.7%
141
 
2.7%
141
 
2.7%
Other values (2) 282
 
5.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5157
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1603
31.1%
1581
30.7%
610
 
11.8%
165
 
3.2%
165
 
3.2%
165
 
3.2%
163
 
3.2%
141
 
2.7%
141
 
2.7%
141
 
2.7%
Other values (2) 282
 
5.5%

englishGrammar_2
Categorical

IMBALANCE  MISSING 

Distinct4
Distinct (%)0.2%
Missing26136
Missing (%)91.7%
Memory size960.6 KiB
2122 
 
118
 
59
 
59

Length

Max length10
Median length5
Mean length5.4003393
Min length5

Characters and Unicode

Total characters12734
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
2122
 
7.4%
118
 
0.4%
59
 
0.2%
59
 
0.2%
(Missing) 26136
91.7%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
2299
90.7%
118
 
4.7%
59
 
2.3%
59
 
2.3%

Most occurring characters

ValueCountFrequency (%)
2476
19.4%
2358
18.5%
2358
18.5%
2358
18.5%
2358
18.5%
236
 
1.9%
177
 
1.4%
118
 
0.9%
118
 
0.9%
59
 
0.5%
Other values (2) 118
 
0.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 12557
98.6%
Space Separator 177
 
1.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
2476
19.7%
2358
18.8%
2358
18.8%
2358
18.8%
2358
18.8%
236
 
1.9%
118
 
0.9%
118
 
0.9%
59
 
0.5%
59
 
0.5%
Space Separator
ValueCountFrequency (%)
177
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 12557
98.6%
Common 177
 
1.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
2476
19.7%
2358
18.8%
2358
18.8%
2358
18.8%
2358
18.8%
236
 
1.9%
118
 
0.9%
118
 
0.9%
59
 
0.5%
59
 
0.5%
Common
ValueCountFrequency (%)
177
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 12734
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2476
19.4%
2358
18.5%
2358
18.5%
2358
18.5%
2358
18.5%
236
 
1.9%
177
 
1.4%
118
 
0.9%
118
 
0.9%
59
 
0.5%
Other values (2) 118
 
0.9%
Distinct4
Distinct (%)0.2%
Missing26581
Missing (%)93.3%
Memory size948.9 KiB
1345 
335 
141 
 
92

Length

Max length9
Median length7
Mean length6.2205959
Min length2

Characters and Unicode

Total characters11900
Distinct characters13
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
1345
 
4.7%
335
 
1.2%
141
 
0.5%
92
 
0.3%
(Missing) 26581
93.3%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
1345
70.3%
335
 
17.5%
141
 
7.4%
92
 
4.8%

Most occurring characters

ValueCountFrequency (%)
2690
22.6%
1719
14.4%
1680
14.1%
1670
14.0%
1437
12.1%
1345
11.3%
476
 
4.0%
233
 
2.0%
184
 
1.5%
141
 
1.2%
Other values (3) 325
 
2.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 11900
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
2690
22.6%
1719
14.4%
1680
14.1%
1670
14.0%
1437
12.1%
1345
11.3%
476
 
4.0%
233
 
2.0%
184
 
1.5%
141
 
1.2%
Other values (3) 325
 
2.7%

Most occurring scripts

ValueCountFrequency (%)
Latin 11900
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
2690
22.6%
1719
14.4%
1680
14.1%
1670
14.0%
1437
12.1%
1345
11.3%
476
 
4.0%
233
 
2.0%
184
 
1.5%
141
 
1.2%
Other values (3) 325
 
2.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 11900
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2690
22.6%
1719
14.4%
1680
14.1%
1670
14.0%
1437
12.1%
1345
11.3%
476
 
4.0%
233
 
2.0%
184
 
1.5%
141
 
1.2%
Other values (3) 325
 
2.7%

englishGrammar_21
Categorical

IMBALANCE  MISSING 

Distinct4
Distinct (%)0.2%
Missing26595
Missing (%)93.3%
Memory size944.5 KiB
1540 
213 
 
92
 
54

Length

Max length6
Median length4
Mean length4.0968931
Min length4

Characters and Unicode

Total characters7780
Distinct characters13
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
1540
 
5.4%
213
 
0.7%
92
 
0.3%
54
 
0.2%
(Missing) 26595
93.3%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
1540
81.1%
213
 
11.2%
92
 
4.8%
54
 
2.8%

Most occurring characters

ValueCountFrequency (%)
1845
23.7%
1540
19.8%
1540
19.8%
1540
19.8%
267
 
3.4%
213
 
2.7%
213
 
2.7%
184
 
2.4%
146
 
1.9%
92
 
1.2%
Other values (3) 200
 
2.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 6240
80.2%
Uppercase Letter 1540
 
19.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
1845
29.6%
1540
24.7%
1540
24.7%
267
 
4.3%
213
 
3.4%
213
 
3.4%
184
 
2.9%
146
 
2.3%
92
 
1.5%
92
 
1.5%
Other values (2) 108
 
1.7%
Uppercase Letter
ValueCountFrequency (%)
1540
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 7780
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
1845
23.7%
1540
19.8%
1540
19.8%
1540
19.8%
267
 
3.4%
213
 
2.7%
213
 
2.7%
184
 
2.4%
146
 
1.9%
92
 
1.2%
Other values (3) 200
 
2.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 7780
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1845
23.7%
1540
19.8%
1540
19.8%
1540
19.8%
267
 
3.4%
213
 
2.7%
213
 
2.7%
184
 
2.4%
146
 
1.9%
92
 
1.2%
Other values (3) 200
 
2.6%
Distinct4
Distinct (%)0.2%
Missing26624
Missing (%)93.4%
Memory size947.6 KiB
1094 
500 
171 
 
105

Length

Max length8
Median length8
Mean length6.2486631
Min length3

Characters and Unicode

Total characters11685
Distinct characters12
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
1094
 
3.8%
500
 
1.8%
171
 
0.6%
105
 
0.4%
(Missing) 26624
93.4%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
1094
58.5%
500
26.7%
171
 
9.1%
105
 
5.6%

Most occurring characters

ValueCountFrequency (%)
2359
20.2%
2188
18.7%
1304
11.2%
1094
9.4%
1094
9.4%
1094
9.4%
671
 
5.7%
605
 
5.2%
500
 
4.3%
500
 
4.3%
Other values (2) 276
 
2.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 11685
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
2359
20.2%
2188
18.7%
1304
11.2%
1094
9.4%
1094
9.4%
1094
9.4%
671
 
5.7%
605
 
5.2%
500
 
4.3%
500
 
4.3%
Other values (2) 276
 
2.4%

Most occurring scripts

ValueCountFrequency (%)
Latin 11685
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
2359
20.2%
2188
18.7%
1304
11.2%
1094
9.4%
1094
9.4%
1094
9.4%
671
 
5.7%
605
 
5.2%
500
 
4.3%
500
 
4.3%
Other values (2) 276
 
2.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 11685
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2359
20.2%
2188
18.7%
1304
11.2%
1094
9.4%
1094
9.4%
1094
9.4%
671
 
5.7%
605
 
5.2%
500
 
4.3%
500
 
4.3%
Other values (2) 276
 
2.4%
Distinct4
Distinct (%)0.3%
Missing27028
Missing (%)94.9%
Memory size936.6 KiB
1048 
278 
118 
 
22

Length

Max length8
Median length8
Mean length7.1446112
Min length5

Characters and Unicode

Total characters10474
Distinct characters14
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
1048
 
3.7%
278
 
1.0%
118
 
0.4%
22
 
0.1%
(Missing) 27028
94.9%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
1048
71.5%
278
 
19.0%
118
 
8.0%
22
 
1.5%

Most occurring characters

ValueCountFrequency (%)
2374
22.7%
1348
12.9%
1070
10.2%
1048
10.0%
1048
10.0%
1048
10.0%
1048
10.0%
300
 
2.9%
278
 
2.7%
278
 
2.7%
Other values (4) 634
 
6.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 10474
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
2374
22.7%
1348
12.9%
1070
10.2%
1048
10.0%
1048
10.0%
1048
10.0%
1048
10.0%
300
 
2.9%
278
 
2.7%
278
 
2.7%
Other values (4) 634
 
6.1%

Most occurring scripts

ValueCountFrequency (%)
Latin 10474
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
2374
22.7%
1348
12.9%
1070
10.2%
1048
10.0%
1048
10.0%
1048
10.0%
1048
10.0%
300
 
2.9%
278
 
2.7%
278
 
2.7%
Other values (4) 634
 
6.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 10474
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2374
22.7%
1348
12.9%
1070
10.2%
1048
10.0%
1048
10.0%
1048
10.0%
1048
10.0%
300
 
2.9%
278
 
2.7%
278
 
2.7%
Other values (4) 634
 
6.1%

englishGrammar_24
Categorical

IMBALANCE  MISSING 

Distinct4
Distinct (%)0.3%
Missing26993
Missing (%)94.7%
Memory size936.4 KiB
1363 
 
106
 
25
 
7

Length

Max length10
Median length6
Mean length6.2738175
Min length5

Characters and Unicode

Total characters9417
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
1363
 
4.8%
106
 
0.4%
25
 
0.1%
7
 
< 0.1%
(Missing) 26993
94.7%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
1363
89.3%
106
 
6.9%
25
 
1.6%
25
 
1.6%
7
 
0.5%

Most occurring characters

ValueCountFrequency (%)
2839
30.1%
1600
17.0%
1469
15.6%
1388
14.7%
1370
14.5%
319
 
3.4%
138
 
1.5%
113
 
1.2%
106
 
1.1%
25
 
0.3%
Other values (2) 50
 
0.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 9392
99.7%
Space Separator 25
 
0.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
2839
30.2%
1600
17.0%
1469
15.6%
1388
14.8%
1370
14.6%
319
 
3.4%
138
 
1.5%
113
 
1.2%
106
 
1.1%
25
 
0.3%
Space Separator
ValueCountFrequency (%)
25
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 9392
99.7%
Common 25
 
0.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
2839
30.2%
1600
17.0%
1469
15.6%
1388
14.8%
1370
14.6%
319
 
3.4%
138
 
1.5%
113
 
1.2%
106
 
1.1%
25
 
0.3%
Common
ValueCountFrequency (%)
25
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 9417
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2839
30.1%
1600
17.0%
1469
15.6%
1388
14.7%
1370
14.5%
319
 
3.4%
138
 
1.5%
113
 
1.2%
106
 
1.1%
25
 
0.3%
Other values (2) 50
 
0.5%

englishGrammar_25
Categorical

IMBALANCE  MISSING 

Distinct4
Distinct (%)0.3%
Missing27031
Missing (%)94.9%
Memory size932.0 KiB
1438 
 
13
 
7
 
5

Length

Max length4
Median length4
Mean length3.9781271
Min length2

Characters and Unicode

Total characters5820
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
1438
 
5.0%
13
 
< 0.1%
7
 
< 0.1%
5
 
< 0.1%
(Missing) 27031
94.9%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
1438
98.3%
13
 
0.9%
7
 
0.5%
5
 
0.3%

Most occurring characters

ValueCountFrequency (%)
1463
25.1%
1463
25.1%
1451
24.9%
1438
24.7%
5
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 5820
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
1463
25.1%
1463
25.1%
1451
24.9%
1438
24.7%
5
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Latin 5820
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
1463
25.1%
1463
25.1%
1451
24.9%
1438
24.7%
5
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5820
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1463
25.1%
1463
25.1%
1451
24.9%
1438
24.7%
5
 
0.1%

englishGrammar_26
Categorical

IMBALANCE  MISSING 

Distinct4
Distinct (%)0.3%
Missing27060
Missing (%)95.0%
Memory size941.0 KiB
1400 
 
14
 
13
 
7

Length

Max length13
Median length11
Mean length11.011158
Min length9

Characters and Unicode

Total characters15790
Distinct characters16
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
1400
 
4.9%
14
 
< 0.1%
13
 
< 0.1%
7
 
< 0.1%
(Missing) 27060
95.0%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
1400
97.6%
14
 
1.0%
13
 
0.9%
7
 
0.5%

Most occurring characters

ValueCountFrequency (%)
2814
17.8%
2807
17.8%
1461
9.3%
1441
9.1%
1435
9.1%
1434
9.1%
1414
9.0%
1407
8.9%
1400
8.9%
41
 
0.3%
Other values (6) 136
 
0.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 15790
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
2814
17.8%
2807
17.8%
1461
9.3%
1441
9.1%
1435
9.1%
1434
9.1%
1414
9.0%
1407
8.9%
1400
8.9%
41
 
0.3%
Other values (6) 136
 
0.9%

Most occurring scripts

ValueCountFrequency (%)
Latin 15790
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
2814
17.8%
2807
17.8%
1461
9.3%
1441
9.1%
1435
9.1%
1434
9.1%
1414
9.0%
1407
8.9%
1400
8.9%
41
 
0.3%
Other values (6) 136
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 15790
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2814
17.8%
2807
17.8%
1461
9.3%
1441
9.1%
1435
9.1%
1434
9.1%
1414
9.0%
1407
8.9%
1400
8.9%
41
 
0.3%
Other values (6) 136
 
0.9%

englishGrammar_27
Categorical

IMBALANCE  MISSING 

Distinct4
Distinct (%)0.3%
Missing27097
Missing (%)95.1%
Memory size932.3 KiB
1146 
 
94
 
87
 
70

Length

Max length14
Median length5
Mean length5.5934145
Min length4

Characters and Unicode

Total characters7814
Distinct characters13
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
1146
 
4.0%
94
 
0.3%
87
 
0.3%
70
 
0.2%
(Missing) 27097
95.1%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
1146
72.3%
94
 
5.9%
94
 
5.9%
94
 
5.9%
87
 
5.5%
70
 
4.4%

Most occurring characters

ValueCountFrequency (%)
1491
19.1%
1421
18.2%
1310
16.8%
1146
14.7%
1146
14.7%
282
 
3.6%
258
 
3.3%
258
 
3.3%
188
 
2.4%
87
 
1.1%
Other values (3) 227
 
2.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 7626
97.6%
Space Separator 188
 
2.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
1491
19.6%
1421
18.6%
1310
17.2%
1146
15.0%
1146
15.0%
282
 
3.7%
258
 
3.4%
258
 
3.4%
87
 
1.1%
87
 
1.1%
Other values (2) 140
 
1.8%
Space Separator
ValueCountFrequency (%)
188
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 7626
97.6%
Common 188
 
2.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
1491
19.6%
1421
18.6%
1310
17.2%
1146
15.0%
1146
15.0%
282
 
3.7%
258
 
3.4%
258
 
3.4%
87
 
1.1%
87
 
1.1%
Other values (2) 140
 
1.8%
Common
ValueCountFrequency (%)
188
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 7814
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1491
19.1%
1421
18.2%
1310
16.8%
1146
14.7%
1146
14.7%
282
 
3.6%
258
 
3.3%
258
 
3.3%
188
 
2.4%
87
 
1.1%
Other values (3) 227
 
2.9%
Distinct4
Distinct (%)0.3%
Missing27049
Missing (%)94.9%
Memory size932.1 KiB
669 
604 
147 
 
25

Length

Max length5
Median length4
Mean length4.4352941
Min length4

Characters and Unicode

Total characters6409
Distinct characters8
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
669
 
2.3%
604
 
2.1%
147
 
0.5%
25
 
0.1%
(Missing) 27049
94.9%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
669
46.3%
604
41.8%
147
 
10.2%
25
 
1.7%

Most occurring characters

ValueCountFrequency (%)
1445
22.5%
1445
22.5%
1298
20.3%
841
13.1%
604
9.4%
604
9.4%
147
 
2.3%
25
 
0.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 6409
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
1445
22.5%
1445
22.5%
1298
20.3%
841
13.1%
604
9.4%
604
9.4%
147
 
2.3%
25
 
0.4%

Most occurring scripts

ValueCountFrequency (%)
Latin 6409
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
1445
22.5%
1445
22.5%
1298
20.3%
841
13.1%
604
9.4%
604
9.4%
147
 
2.3%
25
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 6409
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1445
22.5%
1445
22.5%
1298
20.3%
841
13.1%
604
9.4%
604
9.4%
147
 
2.3%
25
 
0.4%

englishGrammar_3
Categorical

IMBALANCE  MISSING 

Distinct4
Distinct (%)0.2%
Missing26059
Missing (%)91.5%
Memory size973.2 KiB
2187 
 
121
 
115
 
12

Length

Max length13
Median length10
Mean length9.7687885
Min length6

Characters and Unicode

Total characters23787
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
2187
 
7.7%
121
 
0.4%
115
 
0.4%
12
 
< 0.1%
(Missing) 26059
91.5%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
2199
85.8%
236
 
9.2%
127
 
5.0%

Most occurring characters

ValueCountFrequency (%)
4634
19.5%
2562
10.8%
2435
10.2%
2435
10.2%
2435
10.2%
2435
10.2%
2199
9.2%
2199
9.2%
2199
9.2%
127
 
0.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 23660
99.5%
Space Separator 127
 
0.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
4634
19.6%
2562
10.8%
2435
10.3%
2435
10.3%
2435
10.3%
2435
10.3%
2199
9.3%
2199
9.3%
2199
9.3%
127
 
0.5%
Space Separator
ValueCountFrequency (%)
127
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 23660
99.5%
Common 127
 
0.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
4634
19.6%
2562
10.8%
2435
10.3%
2435
10.3%
2435
10.3%
2435
10.3%
2199
9.3%
2199
9.3%
2199
9.3%
127
 
0.5%
Common
ValueCountFrequency (%)
127
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 23787
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
4634
19.5%
2562
10.8%
2435
10.2%
2435
10.2%
2435
10.2%
2435
10.2%
2199
9.2%
2199
9.2%
2199
9.2%
127
 
0.5%

englishGrammar_4
Categorical

IMBALANCE  MISSING 

Distinct4
Distinct (%)0.2%
Missing26084
Missing (%)91.5%
Memory size993.5 KiB
2109 
 
128
 
121
 
52

Length

Max length19
Median length19
Mean length18.728631
Min length16

Characters and Unicode

Total characters45136
Distinct characters16
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
2109
 
7.4%
128
 
0.4%
121
 
0.4%
52
 
0.2%
(Missing) 26084
91.5%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
2358
32.9%
2282
31.8%
2237
31.2%
249
 
3.5%
52
 
0.7%

Most occurring characters

ValueCountFrequency (%)
4820
10.7%
4820
10.7%
4768
10.6%
4692
10.4%
2531
 
5.6%
2410
 
5.3%
2410
 
5.3%
2410
 
5.3%
2410
 
5.3%
2289
 
5.1%
Other values (6) 11576
25.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 40368
89.4%
Space Separator 4768
 
10.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
4820
11.9%
4820
11.9%
4692
11.6%
2531
 
6.3%
2410
 
6.0%
2410
 
6.0%
2410
 
6.0%
2410
 
6.0%
2289
 
5.7%
2289
 
5.7%
Other values (5) 9287
23.0%
Space Separator
ValueCountFrequency (%)
4768
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 40368
89.4%
Common 4768
 
10.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
4820
11.9%
4820
11.9%
4692
11.6%
2531
 
6.3%
2410
 
6.0%
2410
 
6.0%
2410
 
6.0%
2410
 
6.0%
2289
 
5.7%
2289
 
5.7%
Other values (5) 9287
23.0%
Common
ValueCountFrequency (%)
4768
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 45136
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
4820
10.7%
4820
10.7%
4768
10.6%
4692
10.4%
2531
 
5.6%
2410
 
5.3%
2410
 
5.3%
2410
 
5.3%
2410
 
5.3%
2289
 
5.1%
Other values (6) 11576
25.6%

englishGrammar_5
Categorical

Distinct4
Distinct (%)0.2%
Missing26666
Missing (%)93.6%
Memory size943.5 KiB
1238 
234 
213 
143 

Length

Max length8
Median length4
Mean length4.6438731
Min length3

Characters and Unicode

Total characters8489
Distinct characters13
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
1238
 
4.3%
234
 
0.8%
213
 
0.7%
143
 
0.5%
(Missing) 26666
93.6%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
1238
67.7%
234
 
12.8%
213
 
11.7%
143
 
7.8%

Most occurring characters

ValueCountFrequency (%)
1451
17.1%
1381
16.3%
1238
14.6%
1238
14.6%
824
9.7%
447
 
5.3%
447
 
5.3%
356
 
4.2%
234
 
2.8%
234
 
2.8%
Other values (3) 639
7.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 8021
94.5%
Uppercase Letter 234
 
2.8%
Other Punctuation 234
 
2.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
1451
18.1%
1381
17.2%
1238
15.4%
1238
15.4%
824
10.3%
447
 
5.6%
447
 
5.6%
356
 
4.4%
213
 
2.7%
213
 
2.7%
Uppercase Letter
ValueCountFrequency (%)
234
100.0%
Other Punctuation
ValueCountFrequency (%)
234
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 8255
97.2%
Common 234
 
2.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
1451
17.6%
1381
16.7%
1238
15.0%
1238
15.0%
824
10.0%
447
 
5.4%
447
 
5.4%
356
 
4.3%
234
 
2.8%
213
 
2.6%
Other values (2) 426
 
5.2%
Common
ValueCountFrequency (%)
234
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 8489
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1451
17.1%
1381
16.3%
1238
14.6%
1238
14.6%
824
9.7%
447
 
5.3%
447
 
5.3%
356
 
4.2%
234
 
2.8%
234
 
2.8%
Other values (3) 639
7.5%

englishGrammar_6
Categorical

IMBALANCE  MISSING 

Distinct4
Distinct (%)0.2%
Missing26676
Missing (%)93.6%
Memory size945.4 KiB
1501 
 
139
 
107
 
71

Length

Max length8
Median length6
Mean length5.9075908
Min length3

Characters and Unicode

Total characters10740
Distinct characters13
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
1501
 
5.3%
139
 
0.5%
107
 
0.4%
71
 
0.2%
(Missing) 26676
93.6%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
1501
78.0%
139
 
7.2%
107
 
5.6%
107
 
5.6%
71
 
3.7%

Most occurring characters

ValueCountFrequency (%)
1711
15.9%
1679
15.6%
1608
15.0%
1572
14.6%
1501
14.0%
1501
14.0%
495
 
4.6%
142
 
1.3%
139
 
1.3%
107
 
1.0%
Other values (3) 285
 
2.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 10633
99.0%
Space Separator 107
 
1.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
1711
16.1%
1679
15.8%
1608
15.1%
1572
14.8%
1501
14.1%
1501
14.1%
495
 
4.7%
142
 
1.3%
139
 
1.3%
107
 
1.0%
Other values (2) 178
 
1.7%
Space Separator
ValueCountFrequency (%)
107
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 10633
99.0%
Common 107
 
1.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
1711
16.1%
1679
15.8%
1608
15.1%
1572
14.8%
1501
14.1%
1501
14.1%
495
 
4.7%
142
 
1.3%
139
 
1.3%
107
 
1.0%
Other values (2) 178
 
1.7%
Common
ValueCountFrequency (%)
107
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 10740
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1711
15.9%
1679
15.6%
1608
15.0%
1572
14.6%
1501
14.0%
1501
14.0%
495
 
4.6%
142
 
1.3%
139
 
1.3%
107
 
1.0%
Other values (3) 285
 
2.7%

englishGrammar_7
Categorical

Distinct4
Distinct (%)0.2%
Missing26646
Missing (%)93.5%
Memory size945.1 KiB
1097 
307 
224 
220 

Length

Max length9
Median length5
Mean length5.2245671
Min length2

Characters and Unicode

Total characters9655
Distinct characters11
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
1097
 
3.8%
307
 
1.1%
224
 
0.8%
220
 
0.8%
(Missing) 26646
93.5%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
1097
59.4%
307
 
16.6%
224
 
12.1%
220
 
11.9%

Most occurring characters

ValueCountFrequency (%)
3306
34.2%
1541
16.0%
1541
16.0%
1404
14.5%
531
 
5.5%
224
 
2.3%
224
 
2.3%
224
 
2.3%
220
 
2.3%
220
 
2.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 9215
95.4%
Uppercase Letter 220
 
2.3%
Other Punctuation 220
 
2.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
3306
35.9%
1541
16.7%
1541
16.7%
1404
15.2%
531
 
5.8%
224
 
2.4%
224
 
2.4%
224
 
2.4%
220
 
2.4%
Uppercase Letter
ValueCountFrequency (%)
220
100.0%
Other Punctuation
ValueCountFrequency (%)
220
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 9435
97.7%
Common 220
 
2.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
3306
35.0%
1541
16.3%
1541
16.3%
1404
14.9%
531
 
5.6%
224
 
2.4%
224
 
2.4%
224
 
2.4%
220
 
2.3%
220
 
2.3%
Common
ValueCountFrequency (%)
220
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 9655
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3306
34.2%
1541
16.0%
1541
16.0%
1404
14.5%
531
 
5.5%
224
 
2.3%
224
 
2.3%
224
 
2.3%
220
 
2.3%
220
 
2.3%

englishGrammar_8
Categorical

IMBALANCE  MISSING 

Distinct4
Distinct (%)0.2%
Missing26600
Missing (%)93.4%
Memory size954.9 KiB
1639 
 
137
 
62
 
56

Length

Max length12
Median length10
Mean length9.7740232
Min length3

Characters and Unicode

Total characters18512
Distinct characters14
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
1639
 
5.8%
137
 
0.5%
62
 
0.2%
56
 
0.2%
(Missing) 26600
93.4%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
1639
86.5%
137
 
7.2%
62
 
3.3%
56
 
3.0%

Most occurring characters

ValueCountFrequency (%)
3415
18.4%
1975
10.7%
1913
10.3%
1894
10.2%
1776
9.6%
1776
9.6%
1701
9.2%
1639
8.9%
1639
8.9%
274
 
1.5%
Other values (4) 510
 
2.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 18512
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
3415
18.4%
1975
10.7%
1913
10.3%
1894
10.2%
1776
9.6%
1776
9.6%
1701
9.2%
1639
8.9%
1639
8.9%
274
 
1.5%
Other values (4) 510
 
2.8%

Most occurring scripts

ValueCountFrequency (%)
Latin 18512
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
3415
18.4%
1975
10.7%
1913
10.3%
1894
10.2%
1776
9.6%
1776
9.6%
1701
9.2%
1639
8.9%
1639
8.9%
274
 
1.5%
Other values (4) 510
 
2.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 18512
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3415
18.4%
1975
10.7%
1913
10.3%
1894
10.2%
1776
9.6%
1776
9.6%
1701
9.2%
1639
8.9%
1639
8.9%
274
 
1.5%
Other values (4) 510
 
2.8%

englishGrammar_9
Categorical

Distinct4
Distinct (%)0.3%
Missing27001
Missing (%)94.8%
Memory size936.0 KiB
1068 
383 
 
26
 
16

Length

Max length7
Median length6
Mean length6.1828533
Min length3

Characters and Unicode

Total characters9231
Distinct characters9
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
1068
 
3.7%
383
 
1.3%
26
 
0.1%
16
 
0.1%
(Missing) 27001
94.8%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
1068
71.5%
383
 
25.7%
26
 
1.7%
16
 
1.1%

Most occurring characters

ValueCountFrequency (%)
1451
15.7%
1451
15.7%
1451
15.7%
1094
11.9%
1094
11.9%
1094
11.9%
798
8.6%
399
 
4.3%
399
 
4.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 9231
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
1451
15.7%
1451
15.7%
1451
15.7%
1094
11.9%
1094
11.9%
1094
11.9%
798
8.6%
399
 
4.3%
399
 
4.3%

Most occurring scripts

ValueCountFrequency (%)
Latin 9231
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
1451
15.7%
1451
15.7%
1451
15.7%
1094
11.9%
1094
11.9%
1094
11.9%
798
8.6%
399
 
4.3%
399
 
4.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 9231
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1451
15.7%
1451
15.7%
1451
15.7%
1094
11.9%
1094
11.9%
1094
11.9%
798
8.6%
399
 
4.3%
399
 
4.3%

final.accuracy
Real number (ℝ)

Distinct742
Distinct (%)24.4%
Missing25449
Missing (%)89.3%
Infinite0
Infinite (%)0.0%
Mean92.063202
Minimum0
Maximum100
Zeros26
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size222.7 KiB

Quantile statistics

Minimum0
5-th percentile26.592
Q198.14
median99.67
Q3100
95-th percentile100
Maximum100
Range100
Interquartile range (IQR)1.86

Descriptive statistics

Standard deviation21.422897
Coefficient of variation (CV)0.23269772
Kurtosis8.6335598
Mean92.063202
Median Absolute Deviation (MAD)0.33
Skewness-3.1187167
Sum280332.45
Variance458.94052
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
100 1494
 
5.2%
0 26
 
0.1%
99.44 19
 
0.1%
99.4 18
 
0.1%
99.53 18
 
0.1%
99.29 17
 
0.1%
99.35 16
 
0.1%
99.39 16
 
0.1%
99.45 14
 
< 0.1%
99.52 14
 
< 0.1%
Other values (732) 1393
 
4.9%
(Missing) 25449
89.3%
ValueCountFrequency (%)
0 26
0.1%
1.01 1
 
< 0.1%
1.56 1
 
< 0.1%
3.13 1
 
< 0.1%
3.23 1
 
< 0.1%
3.25 1
 
< 0.1%
3.47 1
 
< 0.1%
4.29 1
 
< 0.1%
4.85 1
 
< 0.1%
4.92 1
 
< 0.1%
ValueCountFrequency (%)
100 1494
5.2%
99.79 2
 
< 0.1%
99.77 1
 
< 0.1%
99.73 5
 
< 0.1%
99.72 4
 
< 0.1%
99.71 1
 
< 0.1%
99.7 1
 
< 0.1%
99.69 6
 
< 0.1%
99.68 4
 
< 0.1%
99.67 6
 
< 0.1%

final.wpm
Real number (ℝ)

Distinct360
Distinct (%)11.8%
Missing25449
Missing (%)89.3%
Infinite0
Infinite (%)0.0%
Mean35.777603
Minimum0
Maximum143.4
Zeros24
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size222.7 KiB

Quantile statistics

Minimum0
5-th percentile15.8
Q127.6
median34.6
Q342.6
95-th percentile58.6
Maximum143.4
Range143.4
Interquartile range (IQR)15

Descriptive statistics

Standard deviation14.576262
Coefficient of variation (CV)0.40741305
Kurtosis6.6699444
Mean35.777603
Median Absolute Deviation (MAD)7.6
Skewness1.3918735
Sum108942.8
Variance212.46742
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
33.4 52
 
0.2%
35.6 45
 
0.2%
36.4 38
 
0.1%
33.2 36
 
0.1%
33 33
 
0.1%
32.6 31
 
0.1%
34.6 30
 
0.1%
33.6 30
 
0.1%
28.2 30
 
0.1%
30.2 30
 
0.1%
Other values (350) 2690
 
9.4%
(Missing) 25449
89.3%
ValueCountFrequency (%)
0 24
0.1%
0.6 1
 
< 0.1%
1.2 2
 
< 0.1%
1.4 1
 
< 0.1%
2 1
 
< 0.1%
3.2 2
 
< 0.1%
3.4 1
 
< 0.1%
3.6 4
 
< 0.1%
4.4 4
 
< 0.1%
4.6 1
 
< 0.1%
ValueCountFrequency (%)
143.4 1
 
< 0.1%
137 1
 
< 0.1%
126.4 1
 
< 0.1%
126.2 3
< 0.1%
123.8 1
 
< 0.1%
122.8 1
 
< 0.1%
118 2
< 0.1%
117.8 1
 
< 0.1%
117.6 1
 
< 0.1%
114.6 1
 
< 0.1%

listening_1
Categorical

IMBALANCE  MISSING 

Distinct4
Distinct (%)0.1%
Missing25236
Missing (%)88.6%
Memory size1.0 MiB
2753 
 
235
 
164
 
106

Length

Max length30
Median length27
Mean length26.845918
Min length22

Characters and Unicode

Total characters87464
Distinct characters25
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
2753
 
9.7%
235
 
0.8%
164
 
0.6%
106
 
0.4%
(Missing) 25236
88.6%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
2917
21.5%
2917
21.5%
2753
20.3%
2753
20.3%
341
 
2.5%
235
 
1.7%
235
 
1.7%
235
 
1.7%
235
 
1.7%
235
 
1.7%
Other values (6) 694
 
5.1%

Most occurring characters

ValueCountFrequency (%)
11845
13.5%
10292
11.8%
6622
 
7.6%
6587
 
7.5%
6481
 
7.4%
6329
 
7.2%
3622
 
4.1%
3599
 
4.1%
3258
 
3.7%
3152
 
3.6%
Other values (15) 25677
29.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 67739
77.4%
Space Separator 10292
 
11.8%
Other Punctuation 6175
 
7.1%
Uppercase Letter 3258
 
3.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
11845
17.5%
6622
9.8%
6587
9.7%
6481
9.6%
6329
9.3%
3622
 
5.3%
3599
 
5.3%
3152
 
4.7%
3129
 
4.6%
3023
 
4.5%
Other values (10) 13350
19.7%
Other Punctuation
ValueCountFrequency (%)
3258
52.8%
2917
47.2%
Uppercase Letter
ValueCountFrequency (%)
3023
92.8%
235
 
7.2%
Space Separator
ValueCountFrequency (%)
10292
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 70997
81.2%
Common 16467
 
18.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
11845
16.7%
6622
 
9.3%
6587
 
9.3%
6481
 
9.1%
6329
 
8.9%
3622
 
5.1%
3599
 
5.1%
3152
 
4.4%
3129
 
4.4%
3023
 
4.3%
Other values (12) 16608
23.4%
Common
ValueCountFrequency (%)
10292
62.5%
3258
 
19.8%
2917
 
17.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 87464
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
11845
13.5%
10292
11.8%
6622
 
7.6%
6587
 
7.5%
6481
 
7.4%
6329
 
7.2%
3622
 
4.1%
3599
 
4.1%
3258
 
3.7%
3152
 
3.6%
Other values (15) 25677
29.4%

listening_10
Categorical

Distinct4
Distinct (%)0.1%
Missing25222
Missing (%)88.5%
Memory size1.2 MiB
1877 
652 
382 
361 

Length

Max length97
Median length55
Mean length66.081296
Min length55

Characters and Unicode

Total characters216218
Distinct characters36
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
1877
 
6.6%
652
 
2.3%
382
 
1.3%
361
 
1.3%
(Missing) 25222
88.5%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
3272
 
10.6%
1877
 
6.1%
1877
 
6.1%
1877
 
6.1%
1877
 
6.1%
1877
 
6.1%
1877
 
6.1%
1034
 
3.4%
1013
 
3.3%
652
 
2.1%
Other values (28) 13498
43.9%

Most occurring characters

ValueCountFrequency (%)
27459
 
12.7%
20142
 
9.3%
14574
 
6.7%
14492
 
6.7%
13067
 
6.0%
11963
 
5.5%
11481
 
5.3%
11332
 
5.2%
10738
 
5.0%
8479
 
3.9%
Other values (26) 72491
33.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 170301
78.8%
Space Separator 27459
 
12.7%
Uppercase Letter 7678
 
3.6%
Decimal Number 7508
 
3.5%
Other Punctuation 3272
 
1.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
20142
11.8%
14574
 
8.6%
14492
 
8.5%
13067
 
7.7%
11963
 
7.0%
11481
 
6.7%
11332
 
6.7%
10738
 
6.3%
8479
 
5.0%
7515
 
4.4%
Other values (14) 46518
27.3%
Uppercase Letter
ValueCountFrequency (%)
1877
24.4%
1877
24.4%
1877
24.4%
743
 
9.7%
652
 
8.5%
652
 
8.5%
Decimal Number
ValueCountFrequency (%)
1877
25.0%
1877
25.0%
1877
25.0%
1877
25.0%
Space Separator
ValueCountFrequency (%)
27459
100.0%
Other Punctuation
ValueCountFrequency (%)
3272
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 177979
82.3%
Common 38239
 
17.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
20142
 
11.3%
14574
 
8.2%
14492
 
8.1%
13067
 
7.3%
11963
 
6.7%
11481
 
6.5%
11332
 
6.4%
10738
 
6.0%
8479
 
4.8%
7515
 
4.2%
Other values (20) 54196
30.5%
Common
ValueCountFrequency (%)
27459
71.8%
3272
 
8.6%
1877
 
4.9%
1877
 
4.9%
1877
 
4.9%
1877
 
4.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 216218
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
27459
 
12.7%
20142
 
9.3%
14574
 
6.7%
14492
 
6.7%
13067
 
6.0%
11963
 
5.5%
11481
 
5.3%
11332
 
5.2%
10738
 
5.0%
8479
 
3.9%
Other values (26) 72491
33.5%

listening_2
Categorical

Distinct4
Distinct (%)0.1%
Missing25222
Missing (%)88.5%
Memory size1.1 MiB
2159 
596 
340 
 
177

Length

Max length40
Median length36
Mean length33.526589
Min length23

Characters and Unicode

Total characters109699
Distinct characters26
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
2159
 
7.6%
596
 
2.1%
340
 
1.2%
177
 
0.6%
(Missing) 25222
88.5%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
2755
13.1%
2336
11.1%
2159
10.3%
2159
10.3%
2159
10.3%
2159
10.3%
2159
10.3%
773
 
3.7%
596
 
2.8%
596
 
2.8%
Other values (10) 3167
15.1%

Most occurring characters

ValueCountFrequency (%)
17746
16.2%
12236
11.2%
9912
 
9.0%
8526
 
7.8%
7382
 
6.7%
6994
 
6.4%
5855
 
5.3%
5785
 
5.3%
4385
 
4.0%
3775
 
3.4%
Other values (16) 27103
24.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 85409
77.9%
Space Separator 17746
 
16.2%
Other Punctuation 3272
 
3.0%
Uppercase Letter 3272
 
3.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
12236
14.3%
9912
11.6%
8526
10.0%
7382
8.6%
6994
8.2%
5855
 
6.9%
5785
 
6.8%
4385
 
5.1%
3775
 
4.4%
3095
 
3.6%
Other values (10) 17464
20.4%
Uppercase Letter
ValueCountFrequency (%)
2159
66.0%
773
 
23.6%
340
 
10.4%
Other Punctuation
ValueCountFrequency (%)
2499
76.4%
773
 
23.6%
Space Separator
ValueCountFrequency (%)
17746
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 88681
80.8%
Common 21018
 
19.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
12236
13.8%
9912
11.2%
8526
9.6%
7382
 
8.3%
6994
 
7.9%
5855
 
6.6%
5785
 
6.5%
4385
 
4.9%
3775
 
4.3%
3095
 
3.5%
Other values (13) 20736
23.4%
Common
ValueCountFrequency (%)
17746
84.4%
2499
 
11.9%
773
 
3.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 109699
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
17746
16.2%
12236
11.2%
9912
 
9.0%
8526
 
7.8%
7382
 
6.7%
6994
 
6.4%
5855
 
5.3%
5785
 
5.3%
4385
 
4.0%
3775
 
3.4%
Other values (16) 27103
24.7%

listening_3
Categorical

IMBALANCE  MISSING 

Distinct4
Distinct (%)0.1%
Missing25240
Missing (%)88.6%
Memory size1.1 MiB
2872 
 
268
 
58
 
56

Length

Max length54
Median length39
Mean length40.238476
Min length34

Characters and Unicode

Total characters130936
Distinct characters27
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
2872
 
10.1%
268
 
0.9%
58
 
0.2%
56
 
0.2%
(Missing) 25240
88.6%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
2872
13.9%
2872
13.9%
2872
13.9%
2872
13.9%
2872
13.9%
2872
13.9%
498
 
2.4%
268
 
1.3%
268
 
1.3%
268
 
1.3%
Other values (16) 2178
10.5%

Most occurring characters

ValueCountFrequency (%)
20272
15.5%
17726
13.5%
11984
 
9.2%
9878
 
7.5%
9438
 
7.2%
9436
 
7.2%
6452
 
4.9%
6070
 
4.6%
4326
 
3.3%
4174
 
3.2%
Other values (17) 31180
23.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 103830
79.3%
Space Separator 17726
 
13.5%
Other Punctuation 6126
 
4.7%
Uppercase Letter 3254
 
2.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
20272
19.5%
11984
11.5%
9878
9.5%
9438
9.1%
9436
9.1%
6452
 
6.2%
6070
 
5.8%
4326
 
4.2%
4174
 
4.0%
3752
 
3.6%
Other values (11) 18048
17.4%
Uppercase Letter
ValueCountFrequency (%)
2872
88.3%
326
 
10.0%
56
 
1.7%
Other Punctuation
ValueCountFrequency (%)
3254
53.1%
2872
46.9%
Space Separator
ValueCountFrequency (%)
17726
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 107084
81.8%
Common 23852
 
18.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
20272
18.9%
11984
11.2%
9878
9.2%
9438
8.8%
9436
8.8%
6452
 
6.0%
6070
 
5.7%
4326
 
4.0%
4174
 
3.9%
3752
 
3.5%
Other values (14) 21302
19.9%
Common
ValueCountFrequency (%)
17726
74.3%
3254
 
13.6%
2872
 
12.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 130936
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
20272
15.5%
17726
13.5%
11984
 
9.2%
9878
 
7.5%
9438
 
7.2%
9436
 
7.2%
6452
 
4.9%
6070
 
4.6%
4326
 
3.3%
4174
 
3.2%
Other values (17) 31180
23.8%

listening_4
Categorical

IMBALANCE  MISSING 

Distinct4
Distinct (%)0.1%
Missing25219
Missing (%)88.5%
Memory size1.1 MiB
2719 
 
262
 
174
 
120

Length

Max length52
Median length47
Mean length46.412519
Min length39

Characters and Unicode

Total characters152001
Distinct characters24
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
2719
 
9.5%
262
 
0.9%
174
 
0.6%
120
 
0.4%
(Missing) 25219
88.5%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
3275
12.7%
2839
11.0%
2719
10.5%
2719
10.5%
2719
10.5%
2719
10.5%
2719
10.5%
2719
10.5%
436
 
1.7%
262
 
1.0%
Other values (15) 2746
10.6%

Most occurring characters

ValueCountFrequency (%)
22597
14.9%
18766
12.3%
18386
12.1%
9737
 
6.4%
9497
 
6.2%
9267
 
6.1%
8887
 
5.8%
8691
 
5.7%
6866
 
4.5%
6288
 
4.1%
Other values (14) 33019
21.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 122854
80.8%
Space Separator 22597
 
14.9%
Other Punctuation 3275
 
2.2%
Uppercase Letter 3275
 
2.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
18766
15.3%
18386
15.0%
9737
7.9%
9497
7.7%
9267
7.5%
8887
 
7.2%
8691
 
7.1%
6866
 
5.6%
6288
 
5.1%
4093
 
3.3%
Other values (11) 22376
18.2%
Space Separator
ValueCountFrequency (%)
22597
100.0%
Other Punctuation
ValueCountFrequency (%)
3275
100.0%
Uppercase Letter
ValueCountFrequency (%)
3275
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 126129
83.0%
Common 25872
 
17.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
18766
14.9%
18386
14.6%
9737
 
7.7%
9497
 
7.5%
9267
 
7.3%
8887
 
7.0%
8691
 
6.9%
6866
 
5.4%
6288
 
5.0%
4093
 
3.2%
Other values (12) 25651
20.3%
Common
ValueCountFrequency (%)
22597
87.3%
3275
 
12.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 152001
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
22597
14.9%
18766
12.3%
18386
12.1%
9737
 
6.4%
9497
 
6.2%
9267
 
6.1%
8887
 
5.8%
8691
 
5.7%
6866
 
4.5%
6288
 
4.1%
Other values (14) 33019
21.7%

listening_5
Categorical

Distinct4
Distinct (%)0.1%
Missing25238
Missing (%)88.6%
Memory size1.1 MiB
2137 
786 
216 
 
117

Length

Max length44
Median length34
Mean length36.469287
Min length30

Characters and Unicode

Total characters118744
Distinct characters28
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
2137
 
7.5%
786
 
2.8%
216
 
0.8%
117
 
0.4%
(Missing) 25238
88.6%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
2137
10.8%
2137
10.8%
2137
10.8%
2137
10.8%
2137
10.8%
1002
 
5.0%
1002
 
5.0%
1002
 
5.0%
786
 
4.0%
786
 
4.0%
Other values (13) 4593
23.1%

Most occurring characters

ValueCountFrequency (%)
16600
14.0%
12678
 
10.7%
11572
 
9.7%
9518
 
8.0%
6197
 
5.2%
5963
 
5.0%
5492
 
4.6%
5273
 
4.4%
5060
 
4.3%
4708
 
4.0%
Other values (18) 35683
30.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 92259
77.7%
Space Separator 16600
 
14.0%
Other Punctuation 6629
 
5.6%
Uppercase Letter 3256
 
2.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
12678
13.7%
11572
12.5%
9518
10.3%
6197
 
6.7%
5963
 
6.5%
5492
 
6.0%
5273
 
5.7%
5060
 
5.5%
4708
 
5.1%
4706
 
5.1%
Other values (10) 21092
22.9%
Uppercase Letter
ValueCountFrequency (%)
2137
65.6%
786
 
24.1%
216
 
6.6%
117
 
3.6%
Other Punctuation
ValueCountFrequency (%)
3256
49.1%
2254
34.0%
1119
 
16.9%
Space Separator
ValueCountFrequency (%)
16600
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 95515
80.4%
Common 23229
 
19.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
12678
13.3%
11572
12.1%
9518
 
10.0%
6197
 
6.5%
5963
 
6.2%
5492
 
5.7%
5273
 
5.5%
5060
 
5.3%
4708
 
4.9%
4706
 
4.9%
Other values (14) 24348
25.5%
Common
ValueCountFrequency (%)
16600
71.5%
3256
 
14.0%
2254
 
9.7%
1119
 
4.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 118744
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
16600
14.0%
12678
 
10.7%
11572
 
9.7%
9518
 
8.0%
6197
 
5.2%
5963
 
5.0%
5492
 
4.6%
5273
 
4.4%
5060
 
4.3%
4708
 
4.0%
Other values (18) 35683
30.1%

listening_6
Categorical

Distinct4
Distinct (%)0.1%
Missing25215
Missing (%)88.5%
Memory size1.1 MiB
1727 
928 
547 
 
77

Length

Max length68
Median length66
Mean length63.130528
Min length57

Characters and Unicode

Total characters207005
Distinct characters30
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
1727
 
6.1%
928
 
3.3%
547
 
1.9%
77
 
0.3%
(Missing) 25215
88.5%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
3202
 
9.3%
3202
 
9.3%
2655
 
7.7%
1856
 
5.4%
1727
 
5.0%
1727
 
5.0%
1727
 
5.0%
1727
 
5.0%
1727
 
5.0%
1727
 
5.0%
Other values (28) 13142
38.2%

Most occurring characters

ValueCountFrequency (%)
31140
15.0%
23457
11.3%
23091
11.2%
15578
 
7.5%
13518
 
6.5%
11641
 
5.6%
10307
 
5.0%
8171
 
3.9%
7806
 
3.8%
7130
 
3.4%
Other values (20) 55166
26.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 165011
79.7%
Space Separator 31140
 
15.0%
Other Punctuation 5553
 
2.7%
Uppercase Letter 5301
 
2.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
23457
14.2%
23091
14.0%
15578
9.4%
13518
 
8.2%
11641
 
7.1%
10307
 
6.2%
8171
 
5.0%
7806
 
4.7%
7130
 
4.3%
6088
 
3.7%
Other values (13) 38224
23.2%
Uppercase Letter
ValueCountFrequency (%)
3660
69.0%
1094
 
20.6%
547
 
10.3%
Other Punctuation
ValueCountFrequency (%)
3279
59.0%
1727
31.1%
547
 
9.9%
Space Separator
ValueCountFrequency (%)
31140
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 170312
82.3%
Common 36693
 
17.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
23457
13.8%
23091
13.6%
15578
 
9.1%
13518
 
7.9%
11641
 
6.8%
10307
 
6.1%
8171
 
4.8%
7806
 
4.6%
7130
 
4.2%
6088
 
3.6%
Other values (16) 43525
25.6%
Common
ValueCountFrequency (%)
31140
84.9%
3279
 
8.9%
1727
 
4.7%
547
 
1.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 207005
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
31140
15.0%
23457
11.3%
23091
11.2%
15578
 
7.5%
13518
 
6.5%
11641
 
5.6%
10307
 
5.0%
8171
 
3.9%
7806
 
3.8%
7130
 
3.4%
Other values (20) 55166
26.6%

listening_7
Categorical

Distinct4
Distinct (%)0.1%
Missing25210
Missing (%)88.5%
Memory size1.1 MiB
1298 
992 
957 
 
37

Length

Max length49
Median length39
Mean length35.777101
Min length23

Characters and Unicode

Total characters117492
Distinct characters27
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
1298
 
4.6%
992
 
3.5%
957
 
3.4%
37
 
0.1%
(Missing) 25210
88.5%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
2290
11.4%
2255
11.2%
1986
 
9.9%
1298
 
6.5%
1298
 
6.5%
1298
 
6.5%
1298
 
6.5%
1298
 
6.5%
1029
 
5.1%
992
 
4.9%
Other values (9) 5038
25.1%

Most occurring characters

ValueCountFrequency (%)
16796
14.3%
10536
 
9.0%
9509
 
8.1%
7287
 
6.2%
7254
 
6.2%
6749
 
5.7%
6570
 
5.6%
6568
 
5.6%
5344
 
4.5%
5303
 
4.5%
Other values (17) 35576
30.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 94128
80.1%
Space Separator 16796
 
14.3%
Other Punctuation 3284
 
2.8%
Uppercase Letter 3284
 
2.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
10536
11.2%
9509
 
10.1%
7287
 
7.7%
7254
 
7.7%
6749
 
7.2%
6570
 
7.0%
6568
 
7.0%
5344
 
5.7%
5303
 
5.6%
5233
 
5.6%
Other values (12) 23775
25.3%
Uppercase Letter
ValueCountFrequency (%)
2255
68.7%
992
30.2%
37
 
1.1%
Space Separator
ValueCountFrequency (%)
16796
100.0%
Other Punctuation
ValueCountFrequency (%)
3284
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 97412
82.9%
Common 20080
 
17.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
10536
 
10.8%
9509
 
9.8%
7287
 
7.5%
7254
 
7.4%
6749
 
6.9%
6570
 
6.7%
6568
 
6.7%
5344
 
5.5%
5303
 
5.4%
5233
 
5.4%
Other values (15) 27059
27.8%
Common
ValueCountFrequency (%)
16796
83.6%
3284
 
16.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 117492
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
16796
14.3%
10536
 
9.0%
9509
 
8.1%
7287
 
6.2%
7254
 
6.2%
6749
 
5.7%
6570
 
5.6%
6568
 
5.6%
5344
 
4.5%
5303
 
4.5%
Other values (17) 35576
30.3%

listening_8
Categorical

Distinct4
Distinct (%)0.1%
Missing25225
Missing (%)88.5%
Memory size1.1 MiB
1795 
1338 
 
91
 
45

Length

Max length54
Median length40
Mean length41.448455
Min length40

Characters and Unicode

Total characters135495
Distinct characters33
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
1795
 
6.3%
1338
 
4.7%
91
 
0.3%
45
 
0.2%
(Missing) 25225
88.5%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
1931
 
7.9%
1886
 
7.7%
1886
 
7.7%
1795
 
7.3%
1795
 
7.3%
1795
 
7.3%
1795
 
7.3%
1383
 
5.6%
1338
 
5.4%
1338
 
5.4%
Other values (20) 7641
31.1%

Most occurring characters

ValueCountFrequency (%)
21314
15.7%
13594
 
10.0%
10812
 
8.0%
8650
 
6.4%
8192
 
6.0%
7831
 
5.8%
7785
 
5.7%
5990
 
4.4%
5567
 
4.1%
4744
 
3.5%
Other values (23) 41016
30.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 101061
74.6%
Space Separator 21314
 
15.7%
Other Punctuation 6582
 
4.9%
Uppercase Letter 5200
 
3.8%
Dash Punctuation 1338
 
1.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
13594
13.5%
10812
10.7%
8650
 
8.6%
8192
 
8.1%
7831
 
7.7%
7785
 
7.7%
5990
 
5.9%
5567
 
5.5%
4744
 
4.7%
4607
 
4.6%
Other values (13) 23289
23.0%
Other Punctuation
ValueCountFrequency (%)
3224
49.0%
1840
28.0%
1473
22.4%
45
 
0.7%
Uppercase Letter
ValueCountFrequency (%)
1976
38.0%
1795
34.5%
1338
25.7%
91
 
1.8%
Space Separator
ValueCountFrequency (%)
21314
100.0%
Dash Punctuation
ValueCountFrequency (%)
1338
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 106261
78.4%
Common 29234
 
21.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
13594
12.8%
10812
 
10.2%
8650
 
8.1%
8192
 
7.7%
7831
 
7.4%
7785
 
7.3%
5990
 
5.6%
5567
 
5.2%
4744
 
4.5%
4607
 
4.3%
Other values (17) 28489
26.8%
Common
ValueCountFrequency (%)
21314
72.9%
3224
 
11.0%
1840
 
6.3%
1473
 
5.0%
1338
 
4.6%
45
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 135495
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
21314
15.7%
13594
 
10.0%
10812
 
8.0%
8650
 
6.4%
8192
 
6.0%
7831
 
5.8%
7785
 
5.7%
5990
 
4.4%
5567
 
4.1%
4744
 
3.5%
Other values (23) 41016
30.3%

listening_9
Categorical

IMBALANCE  MISSING 

Distinct4
Distinct (%)0.1%
Missing25213
Missing (%)88.5%
Memory size1.2 MiB
2623 
584 
 
38
 
36

Length

Max length75
Median length75
Mean length73.511429
Min length47

Characters and Unicode

Total characters241191
Distinct characters31
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
2623
 
9.2%
584
 
2.0%
38
 
0.1%
36
 
0.1%
(Missing) 25213
88.5%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
3827
 
8.5%
3281
 
7.3%
2623
 
5.8%
2623
 
5.8%
2623
 
5.8%
2623
 
5.8%
2623
 
5.8%
2623
 
5.8%
2623
 
5.8%
2623
 
5.8%
Other values (26) 16850
37.5%

Most occurring characters

ValueCountFrequency (%)
41661
17.3%
26064
10.8%
25433
10.5%
17674
 
7.3%
16949
 
7.0%
16185
 
6.7%
13634
 
5.7%
10564
 
4.4%
9695
 
4.0%
9149
 
3.8%
Other values (21) 54183
22.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 192272
79.7%
Space Separator 41661
 
17.3%
Other Punctuation 3941
 
1.6%
Uppercase Letter 3317
 
1.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
26064
13.6%
25433
13.2%
17674
9.2%
16949
8.8%
16185
8.4%
13634
 
7.1%
10564
 
5.5%
9695
 
5.0%
9149
 
4.8%
7108
 
3.7%
Other values (13) 39817
20.7%
Uppercase Letter
ValueCountFrequency (%)
2623
79.1%
584
 
17.6%
74
 
2.2%
36
 
1.1%
Other Punctuation
ValueCountFrequency (%)
3281
83.3%
622
 
15.8%
38
 
1.0%
Space Separator
ValueCountFrequency (%)
41661
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 195589
81.1%
Common 45602
 
18.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
26064
13.3%
25433
13.0%
17674
9.0%
16949
 
8.7%
16185
 
8.3%
13634
 
7.0%
10564
 
5.4%
9695
 
5.0%
9149
 
4.7%
7108
 
3.6%
Other values (17) 43134
22.1%
Common
ValueCountFrequency (%)
41661
91.4%
3281
 
7.2%
622
 
1.4%
38
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 241191
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
41661
17.3%
26064
10.8%
25433
10.5%
17674
 
7.3%
16949
 
7.0%
16185
 
6.7%
13634
 
5.7%
10564
 
4.4%
9695
 
4.0%
9149
 
3.8%
Other values (21) 54183
22.5%

percent
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing11232
Missing (%)39.4%
Memory size1.0 MiB
Distinct4
Distinct (%)0.1%
Missing25001
Missing (%)87.7%
Memory size1005.5 KiB
1919 
1267 
299 
 
8

Length

Max length11
Median length7
Mean length8.7054108
Min length6

Characters and Unicode

Total characters30408
Distinct characters15
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
1919
 
6.7%
1267
 
4.4%
299
 
1.0%
8
 
< 0.1%
(Missing) 25001
87.7%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
1919
30.3%
1267
20.0%
1267
20.0%
1267
20.0%
299
 
4.7%
299
 
4.7%
8
 
0.1%

Most occurring characters

ValueCountFrequency (%)
4453
14.6%
3838
12.6%
3485
11.5%
3431
11.3%
3186
10.5%
2833
9.3%
2534
8.3%
2234
7.3%
1919
6.3%
1275
 
4.2%
Other values (5) 1220
 
4.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 27575
90.7%
Space Separator 2833
 
9.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
4453
16.1%
3838
13.9%
3485
12.6%
3431
12.4%
3186
11.6%
2534
9.2%
2234
8.1%
1919
7.0%
1275
 
4.6%
598
 
2.2%
Other values (4) 622
 
2.3%
Space Separator
ValueCountFrequency (%)
2833
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 27575
90.7%
Common 2833
 
9.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
4453
16.1%
3838
13.9%
3485
12.6%
3431
12.4%
3186
11.6%
2534
9.2%
2234
8.1%
1919
7.0%
1275
 
4.6%
598
 
2.2%
Other values (4) 622
 
2.3%
Common
ValueCountFrequency (%)
2833
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 30408
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
4453
14.6%
3838
12.6%
3485
11.5%
3431
11.3%
3186
10.5%
2833
9.3%
2534
8.3%
2234
7.3%
1919
6.3%
1275
 
4.2%
Other values (5) 1220
 
4.0%
Distinct4
Distinct (%)0.1%
Missing25140
Missing (%)88.2%
Memory size1.1 MiB
2011 
807 
281 
255 

Length

Max length64
Median length64
Mean length59.881038
Min length49

Characters and Unicode

Total characters200841
Distinct characters22
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
2011
 
7.1%
807
 
2.8%
281
 
1.0%
255
 
0.9%
(Missing) 25140
88.2%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
3073
 
9.5%
2818
 
8.7%
2547
 
7.9%
2292
 
7.1%
2011
 
6.2%
2011
 
6.2%
2011
 
6.2%
2011
 
6.2%
2011
 
6.2%
2011
 
6.2%
Other values (19) 9646
29.7%

Most occurring characters

ValueCountFrequency (%)
29088
14.5%
23749
11.8%
22664
11.3%
18500
9.2%
15295
 
7.6%
12609
 
6.3%
12047
 
6.0%
10168
 
5.1%
7912
 
3.9%
6718
 
3.3%
Other values (12) 42091
21.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 171753
85.5%
Space Separator 29088
 
14.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
23749
13.8%
22664
13.2%
18500
10.8%
15295
8.9%
12609
 
7.3%
12047
 
7.0%
10168
 
5.9%
7912
 
4.6%
6718
 
3.9%
6453
 
3.8%
Other values (11) 35638
20.7%
Space Separator
ValueCountFrequency (%)
29088
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 171753
85.5%
Common 29088
 
14.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
23749
13.8%
22664
13.2%
18500
10.8%
15295
8.9%
12609
 
7.3%
12047
 
7.0%
10168
 
5.9%
7912
 
4.6%
6718
 
3.9%
6453
 
3.8%
Other values (11) 35638
20.7%
Common
ValueCountFrequency (%)
29088
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 200841
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
29088
14.5%
23749
11.8%
22664
11.3%
18500
9.2%
15295
 
7.6%
12609
 
6.3%
12047
 
6.0%
10168
 
5.1%
7912
 
3.9%
6718
 
3.3%
Other values (12) 42091
21.0%
Distinct4
Distinct (%)0.1%
Missing25377
Missing (%)89.1%
Memory size1.0 MiB
2091 
545 
252 
229 

Length

Max length44
Median length20
Mean length25.665704
Min length20

Characters and Unicode

Total characters80000
Distinct characters24
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
2091
 
7.3%
545
 
1.9%
252
 
0.9%
229
 
0.8%
(Missing) 25377
89.1%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
3117
20.0%
2343
15.1%
2091
13.4%
2091
13.4%
1548
9.9%
774
 
5.0%
774
 
5.0%
545
 
3.5%
545
 
3.5%
545
 
3.5%
Other values (5) 1191
 
7.7%

Most occurring characters

ValueCountFrequency (%)
13805
17.3%
12447
15.6%
11444
14.3%
7324
9.2%
6756
8.4%
5185
 
6.5%
4436
 
5.5%
3117
 
3.9%
3117
 
3.9%
2091
 
2.6%
Other values (14) 10278
12.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 61319
76.6%
Space Separator 12447
 
15.6%
Uppercase Letter 3117
 
3.9%
Other Punctuation 3117
 
3.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
13805
22.5%
11444
18.7%
7324
11.9%
6756
11.0%
5185
 
8.5%
4436
 
7.2%
2091
 
3.4%
2029
 
3.3%
1548
 
2.5%
1342
 
2.2%
Other values (11) 5359
 
8.7%
Space Separator
ValueCountFrequency (%)
12447
100.0%
Uppercase Letter
ValueCountFrequency (%)
3117
100.0%
Other Punctuation
ValueCountFrequency (%)
3117
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 64436
80.5%
Common 15564
 
19.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
13805
21.4%
11444
17.8%
7324
11.4%
6756
10.5%
5185
 
8.0%
4436
 
6.9%
3117
 
4.8%
2091
 
3.2%
2029
 
3.1%
1548
 
2.4%
Other values (12) 6701
10.4%
Common
ValueCountFrequency (%)
12447
80.0%
3117
 
20.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 80000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
13805
17.3%
12447
15.6%
11444
14.3%
7324
9.2%
6756
8.4%
5185
 
6.5%
4436
 
5.5%
3117
 
3.9%
3117
 
3.9%
2091
 
2.6%
Other values (14) 10278
12.8%

readingComprehension_2
Categorical

IMBALANCE  MISSING 

Distinct4
Distinct (%)0.1%
Missing25023
Missing (%)87.8%
Memory size1005.8 KiB
3156 
 
198
 
91
 
26

Length

Max length11
Median length9
Mean length8.9841544
Min length8

Characters and Unicode

Total characters31184
Distinct characters17
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
3156
 
11.1%
198
 
0.7%
91
 
0.3%
26
 
0.1%
(Missing) 25023
87.8%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
3156
90.9%
198
 
5.7%
91
 
2.6%
26
 
0.7%

Most occurring characters

ValueCountFrequency (%)
9900
31.7%
3471
 
11.1%
3364
 
10.8%
3354
 
10.8%
3273
 
10.5%
3247
 
10.4%
3156
 
10.1%
289
 
0.9%
224
 
0.7%
198
 
0.6%
Other values (7) 708
 
2.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 31184
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
9900
31.7%
3471
 
11.1%
3364
 
10.8%
3354
 
10.8%
3273
 
10.5%
3247
 
10.4%
3156
 
10.1%
289
 
0.9%
224
 
0.7%
198
 
0.6%
Other values (7) 708
 
2.3%

Most occurring scripts

ValueCountFrequency (%)
Latin 31184
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
9900
31.7%
3471
 
11.1%
3364
 
10.8%
3354
 
10.8%
3273
 
10.5%
3247
 
10.4%
3156
 
10.1%
289
 
0.9%
224
 
0.7%
198
 
0.6%
Other values (7) 708
 
2.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 31184
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
9900
31.7%
3471
 
11.1%
3364
 
10.8%
3354
 
10.8%
3273
 
10.5%
3247
 
10.4%
3156
 
10.1%
289
 
0.9%
224
 
0.7%
198
 
0.6%
Other values (7) 708
 
2.3%

readingComprehension_3
Categorical

IMBALANCE  MISSING 

Distinct4
Distinct (%)0.1%
Missing25000
Missing (%)87.7%
Memory size1016.0 KiB
3167 
 
145
 
101
 
81

Length

Max length12
Median length12
Mean length11.763595
Min length7

Characters and Unicode

Total characters41102
Distinct characters17
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
3167
 
11.1%
145
 
0.5%
101
 
0.4%
81
 
0.3%
(Missing) 25000
87.7%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
3494
33.8%
3167
30.6%
3167
30.6%
182
 
1.8%
145
 
1.4%
101
 
1.0%
81
 
0.8%

Most occurring characters

ValueCountFrequency (%)
6843
16.6%
6334
15.4%
6334
15.4%
3514
8.5%
3494
8.5%
3494
8.5%
3494
8.5%
3349
8.1%
3167
7.7%
283
 
0.7%
Other values (7) 796
 
1.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 34259
83.4%
Space Separator 6843
 
16.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
6334
18.5%
6334
18.5%
3514
10.3%
3494
10.2%
3494
10.2%
3494
10.2%
3349
9.8%
3167
9.2%
283
 
0.8%
182
 
0.5%
Other values (6) 614
 
1.8%
Space Separator
ValueCountFrequency (%)
6843
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 34259
83.4%
Common 6843
 
16.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
6334
18.5%
6334
18.5%
3514
10.3%
3494
10.2%
3494
10.2%
3494
10.2%
3349
9.8%
3167
9.2%
283
 
0.8%
182
 
0.5%
Other values (6) 614
 
1.8%
Common
ValueCountFrequency (%)
6843
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 41102
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
6843
16.6%
6334
15.4%
6334
15.4%
3514
8.5%
3494
8.5%
3494
8.5%
3494
8.5%
3349
8.1%
3167
7.7%
283
 
0.7%
Other values (7) 796
 
1.9%
Distinct4
Distinct (%)0.1%
Missing25007
Missing (%)87.8%
Memory size1002.4 KiB
1423 
890 
785 
389 

Length

Max length10
Median length8
Mean length7.8554631
Min length6

Characters and Unicode

Total characters27392
Distinct characters15
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
1423
 
5.0%
890
 
3.1%
785
 
2.8%
389
 
1.4%
(Missing) 25007
87.8%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
1423
40.8%
890
25.5%
785
22.5%
389
 
11.2%

Most occurring characters

ValueCountFrequency (%)
5299
19.3%
3487
12.7%
3487
12.7%
3098
11.3%
2846
10.4%
1812
 
6.6%
1423
 
5.2%
1423
 
5.2%
890
 
3.2%
890
 
3.2%
Other values (5) 2737
10.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 27392
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
5299
19.3%
3487
12.7%
3487
12.7%
3098
11.3%
2846
10.4%
1812
 
6.6%
1423
 
5.2%
1423
 
5.2%
890
 
3.2%
890
 
3.2%
Other values (5) 2737
10.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 27392
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
5299
19.3%
3487
12.7%
3487
12.7%
3098
11.3%
2846
10.4%
1812
 
6.6%
1423
 
5.2%
1423
 
5.2%
890
 
3.2%
890
 
3.2%
Other values (5) 2737
10.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 27392
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
5299
19.3%
3487
12.7%
3487
12.7%
3098
11.3%
2846
10.4%
1812
 
6.6%
1423
 
5.2%
1423
 
5.2%
890
 
3.2%
890
 
3.2%
Other values (5) 2737
10.0%

readingComprehension_5
Categorical

IMBALANCE  MISSING 

Distinct4
Distinct (%)0.1%
Missing25012
Missing (%)87.8%
Memory size1012.6 KiB
3029 
 
226
 
207
 
20

Length

Max length11
Median length11
Mean length10.875359
Min length9

Characters and Unicode

Total characters37868
Distinct characters16
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
3029
 
10.6%
226
 
0.8%
207
 
0.7%
20
 
0.1%
(Missing) 25012
87.8%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
3029
87.0%
226
 
6.5%
207
 
5.9%
20
 
0.6%

Most occurring characters

ValueCountFrequency (%)
9787
25.8%
3688
 
9.7%
3501
 
9.2%
3482
 
9.2%
3275
 
8.6%
3236
 
8.5%
3236
 
8.5%
3049
 
8.1%
3029
 
8.0%
640
 
1.7%
Other values (6) 945
 
2.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 37868
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
9787
25.8%
3688
 
9.7%
3501
 
9.2%
3482
 
9.2%
3275
 
8.6%
3236
 
8.5%
3236
 
8.5%
3049
 
8.1%
3029
 
8.0%
640
 
1.7%
Other values (6) 945
 
2.5%

Most occurring scripts

ValueCountFrequency (%)
Latin 37868
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
9787
25.8%
3688
 
9.7%
3501
 
9.2%
3482
 
9.2%
3275
 
8.6%
3236
 
8.5%
3236
 
8.5%
3049
 
8.1%
3029
 
8.0%
640
 
1.7%
Other values (6) 945
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 37868
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
9787
25.8%
3688
 
9.7%
3501
 
9.2%
3482
 
9.2%
3275
 
8.6%
3236
 
8.5%
3236
 
8.5%
3049
 
8.1%
3029
 
8.0%
640
 
1.7%
Other values (6) 945
 
2.5%
Distinct4
Distinct (%)0.1%
Missing25018
Missing (%)87.8%
Memory size1001.7 KiB
2452 
705 
 
160
 
159

Length

Max length10
Median length7
Mean length7.7459724
Min length7

Characters and Unicode

Total characters26925
Distinct characters17
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
2452
 
8.6%
705
 
2.5%
160
 
0.6%
159
 
0.6%
(Missing) 25018
87.8%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
2452
70.5%
705
 
20.3%
160
 
4.6%
159
 
4.6%

Most occurring characters

ValueCountFrequency (%)
5063
18.8%
4182
15.5%
3476
12.9%
3157
11.7%
2771
10.3%
2611
9.7%
1410
 
5.2%
705
 
2.6%
705
 
2.6%
705
 
2.6%
Other values (7) 2140
7.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 26925
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
5063
18.8%
4182
15.5%
3476
12.9%
3157
11.7%
2771
10.3%
2611
9.7%
1410
 
5.2%
705
 
2.6%
705
 
2.6%
705
 
2.6%
Other values (7) 2140
7.9%

Most occurring scripts

ValueCountFrequency (%)
Latin 26925
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
5063
18.8%
4182
15.5%
3476
12.9%
3157
11.7%
2771
10.3%
2611
9.7%
1410
 
5.2%
705
 
2.6%
705
 
2.6%
705
 
2.6%
Other values (7) 2140
7.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 26925
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
5063
18.8%
4182
15.5%
3476
12.9%
3157
11.7%
2771
10.3%
2611
9.7%
1410
 
5.2%
705
 
2.6%
705
 
2.6%
705
 
2.6%
Other values (7) 2140
7.9%
Distinct4
Distinct (%)0.1%
Missing25000
Missing (%)87.7%
Memory size1.0 MiB
1875 
1530 
 
46
 
43

Length

Max length23
Median length23
Mean length18.381225
Min length11

Characters and Unicode

Total characters64224
Distinct characters19
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
1875
 
6.6%
1530
 
5.4%
46
 
0.2%
43
 
0.2%
(Missing) 25000
87.7%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
1921
15.5%
1875
15.2%
1875
15.2%
1875
15.2%
1573
12.7%
1530
12.4%
1530
12.4%
46
 
0.4%
46
 
0.4%
43
 
0.3%

Most occurring characters

ValueCountFrequency (%)
8863
13.8%
8518
13.3%
5717
8.9%
5671
8.8%
5671
8.8%
5369
8.4%
4978
7.8%
3497
 
5.4%
3494
 
5.4%
3451
 
5.4%
Other values (9) 8995
14.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 55361
86.2%
Space Separator 8863
 
13.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
8518
15.4%
5717
10.3%
5671
10.2%
5671
10.2%
5369
9.7%
4978
9.0%
3497
6.3%
3494
6.3%
3451
6.2%
1921
 
3.5%
Other values (8) 7074
12.8%
Space Separator
ValueCountFrequency (%)
8863
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 55361
86.2%
Common 8863
 
13.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
8518
15.4%
5717
10.3%
5671
10.2%
5671
10.2%
5369
9.7%
4978
9.0%
3497
6.3%
3494
6.3%
3451
6.2%
1921
 
3.5%
Other values (8) 7074
12.8%
Common
ValueCountFrequency (%)
8863
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 64224
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
8863
13.8%
8518
13.3%
5717
8.9%
5671
8.8%
5671
8.8%
5369
8.4%
4978
7.8%
3497
 
5.4%
3494
 
5.4%
3451
 
5.4%
Other values (9) 8995
14.0%
Distinct4
Distinct (%)0.1%
Missing25030
Missing (%)87.8%
Memory size1007.6 KiB
2574 
502 
277 
 
111

Length

Max length10
Median length10
Mean length9.5840069
Min length6

Characters and Unicode

Total characters33199
Distinct characters15
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
2574
 
9.0%
502
 
1.8%
277
 
1.0%
111
 
0.4%
(Missing) 25030
87.8%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
2574
74.3%
502
 
14.5%
277
 
8.0%
111
 
3.2%

Most occurring characters

ValueCountFrequency (%)
7722
23.3%
5650
17.0%
3076
 
9.3%
3076
 
9.3%
3076
 
9.3%
2851
 
8.6%
2574
 
7.8%
1503
 
4.5%
1001
 
3.0%
890
 
2.7%
Other values (5) 1780
 
5.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 33199
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
7722
23.3%
5650
17.0%
3076
 
9.3%
3076
 
9.3%
3076
 
9.3%
2851
 
8.6%
2574
 
7.8%
1503
 
4.5%
1001
 
3.0%
890
 
2.7%
Other values (5) 1780
 
5.4%

Most occurring scripts

ValueCountFrequency (%)
Latin 33199
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
7722
23.3%
5650
17.0%
3076
 
9.3%
3076
 
9.3%
3076
 
9.3%
2851
 
8.6%
2574
 
7.8%
1503
 
4.5%
1001
 
3.0%
890
 
2.7%
Other values (5) 1780
 
5.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 33199
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
7722
23.3%
5650
17.0%
3076
 
9.3%
3076
 
9.3%
3076
 
9.3%
2851
 
8.6%
2574
 
7.8%
1503
 
4.5%
1001
 
3.0%
890
 
2.7%
Other values (5) 1780
 
5.4%
Distinct4
Distinct (%)0.1%
Missing25014
Missing (%)87.8%
Memory size1009.0 KiB
1790 
1397 
188 
 
105

Length

Max length12
Median length9
Mean length9.8436782
Min length4

Characters and Unicode

Total characters34256
Distinct characters14
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
1790
 
6.3%
1397
 
4.9%
188
 
0.7%
105
 
0.4%
(Missing) 25014
87.8%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
1790
51.4%
1397
40.1%
188
 
5.4%
105
 
3.0%

Most occurring characters

ValueCountFrequency (%)
7483
21.8%
4977
14.5%
3563
10.4%
3480
10.2%
3292
9.6%
3187
9.3%
1895
 
5.5%
1790
 
5.2%
1397
 
4.1%
1397
 
4.1%
Other values (4) 1795
 
5.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 34256
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
7483
21.8%
4977
14.5%
3563
10.4%
3480
10.2%
3292
9.6%
3187
9.3%
1895
 
5.5%
1790
 
5.2%
1397
 
4.1%
1397
 
4.1%
Other values (4) 1795
 
5.2%

Most occurring scripts

ValueCountFrequency (%)
Latin 34256
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
7483
21.8%
4977
14.5%
3563
10.4%
3480
10.2%
3292
9.6%
3187
9.3%
1895
 
5.5%
1790
 
5.2%
1397
 
4.1%
1397
 
4.1%
Other values (4) 1795
 
5.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 34256
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
7483
21.8%
4977
14.5%
3563
10.4%
3480
10.2%
3292
9.6%
3187
9.3%
1895
 
5.5%
1790
 
5.2%
1397
 
4.1%
1397
 
4.1%
Other values (4) 1795
 
5.2%

score
Real number (ℝ)

Distinct34
Distinct (%)0.2%
Missing11229
Missing (%)39.4%
Infinite0
Infinite (%)0.0%
Mean10.793223
Minimum0
Maximum33
Zeros25
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size222.7 KiB

Quantile statistics

Minimum0
5-th percentile4
Q17
median9
Q313
95-th percentile27
Maximum33
Range33
Interquartile range (IQR)6

Descriptive statistics

Standard deviation6.1324478
Coefficient of variation (CV)0.56817576
Kurtosis2.2937556
Mean10.793223
Median Absolute Deviation (MAD)3
Skewness1.5992262
Sum186345
Variance37.606917
MonotonicityNot monotonic
Histogram with fixed size bins (bins=34)
ValueCountFrequency (%)
8 2040
 
7.2%
7 1896
 
6.7%
9 1558
 
5.5%
6 1538
 
5.4%
13 1472
 
5.2%
12 1289
 
4.5%
14 1181
 
4.1%
10 1120
 
3.9%
11 1057
 
3.7%
5 990
 
3.5%
Other values (24) 3124
 
11.0%
(Missing) 11229
39.4%
ValueCountFrequency (%)
0 25
 
0.1%
1 55
 
0.2%
2 121
 
0.4%
3 272
 
1.0%
4 599
 
2.1%
5 990
3.5%
6 1538
5.4%
7 1896
6.7%
8 2040
7.2%
9 1558
5.5%
ValueCountFrequency (%)
33 14
 
< 0.1%
32 16
 
0.1%
31 49
 
0.2%
30 124
0.4%
29 193
0.7%
28 252
0.9%
27 258
0.9%
26 250
0.9%
25 180
0.6%
24 155
0.5%

scoreBreakdown.pickIncorrect
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing28494
Missing (%)100.0%
Memory size222.7 KiB

scoreBreakdown.tenses
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing28494
Missing (%)100.0%
Memory size222.7 KiB

scoreBreakdown.wordSelection
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing28494
Missing (%)100.0%
Memory size222.7 KiB

situationalJudgement_1
Categorical

IMBALANCE  MISSING 

Distinct4
Distinct (%)0.2%
Missing25837
Missing (%)90.7%
Memory size1.5 MiB
2388 
 
210
 
46
 
13

Length

Max length233
Median length233
Mean length219.28039
Min length4

Characters and Unicode

Total characters582628
Distinct characters32
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
2388
 
8.4%
210
 
0.7%
46
 
0.2%
13
 
< 0.1%
(Missing) 25837
90.7%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
15050
 
15.4%
5032
 
5.2%
4986
 
5.1%
4776
 
4.9%
2808
 
2.9%
2598
 
2.7%
2598
 
2.7%
2434
 
2.5%
2434
 
2.5%
2434
 
2.5%
Other values (41) 52439
53.7%

Most occurring characters

ValueCountFrequency (%)
95188
16.3%
59813
10.3%
57642
9.9%
46980
 
8.1%
40059
 
6.9%
35073
 
6.0%
32580
 
5.6%
30310
 
5.2%
28106
 
4.8%
26708
 
4.6%
Other values (22) 130169
22.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 472495
81.1%
Space Separator 95188
 
16.3%
Other Punctuation 9900
 
1.7%
Uppercase Letter 5045
 
0.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
59813
12.7%
57642
12.2%
46980
9.9%
40059
8.5%
35073
 
7.4%
32580
 
6.9%
30310
 
6.4%
28106
 
5.9%
26708
 
5.7%
22214
 
4.7%
Other values (13) 93010
19.7%
Uppercase Letter
ValueCountFrequency (%)
2388
47.3%
2388
47.3%
210
 
4.2%
46
 
0.9%
13
 
0.3%
Other Punctuation
ValueCountFrequency (%)
5032
50.8%
2480
25.1%
2388
24.1%
Space Separator
ValueCountFrequency (%)
95188
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 477540
82.0%
Common 105088
 
18.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
59813
12.5%
57642
12.1%
46980
9.8%
40059
8.4%
35073
 
7.3%
32580
 
6.8%
30310
 
6.3%
28106
 
5.9%
26708
 
5.6%
22214
 
4.7%
Other values (18) 98055
20.5%
Common
ValueCountFrequency (%)
95188
90.6%
5032
 
4.8%
2480
 
2.4%
2388
 
2.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 582628
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
95188
16.3%
59813
10.3%
57642
9.9%
46980
 
8.1%
40059
 
6.9%
35073
 
6.0%
32580
 
5.6%
30310
 
5.2%
28106
 
4.8%
26708
 
4.6%
Other values (22) 130169
22.3%

situationalJudgement_10
Categorical

IMBALANCE  MISSING 

Distinct4
Distinct (%)0.2%
Missing26378
Missing (%)92.6%
Memory size1.3 MiB
1953 
 
94
 
59
 
10

Length

Max length219
Median length219
Mean length207.94234
Min length7

Characters and Unicode

Total characters440006
Distinct characters31
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
1953
 
6.9%
94
 
0.3%
59
 
0.2%
10
 
< 0.1%
(Missing) 26378
92.6%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
6151
 
8.9%
4059
 
5.9%
4010
 
5.8%
3906
 
5.7%
2057
 
3.0%
2047
 
3.0%
2047
 
3.0%
1963
 
2.9%
1963
 
2.9%
1953
 
2.8%
Other values (39) 38611
56.1%

Most occurring characters

ValueCountFrequency (%)
66651
15.1%
42003
 
9.5%
36163
 
8.2%
35758
 
8.1%
32238
 
7.3%
30379
 
6.9%
20277
 
4.6%
20030
 
4.6%
18087
 
4.1%
17918
 
4.1%
Other values (21) 120502
27.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 365217
83.0%
Space Separator 66651
 
15.1%
Uppercase Letter 4128
 
0.9%
Other Punctuation 4010
 
0.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
42003
11.5%
36163
 
9.9%
35758
 
9.8%
32238
 
8.8%
30379
 
8.3%
20277
 
5.6%
20030
 
5.5%
18087
 
5.0%
17918
 
4.9%
15916
 
4.4%
Other values (14) 96448
26.4%
Uppercase Letter
ValueCountFrequency (%)
1953
47.3%
1953
47.3%
104
 
2.5%
59
 
1.4%
59
 
1.4%
Space Separator
ValueCountFrequency (%)
66651
100.0%
Other Punctuation
ValueCountFrequency (%)
4010
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 369345
83.9%
Common 70661
 
16.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
42003
11.4%
36163
 
9.8%
35758
 
9.7%
32238
 
8.7%
30379
 
8.2%
20277
 
5.5%
20030
 
5.4%
18087
 
4.9%
17918
 
4.9%
15916
 
4.3%
Other values (19) 100576
27.2%
Common
ValueCountFrequency (%)
66651
94.3%
4010
 
5.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 440006
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
66651
15.1%
42003
 
9.5%
36163
 
8.2%
35758
 
8.1%
32238
 
7.3%
30379
 
6.9%
20277
 
4.6%
20030
 
4.6%
18087
 
4.1%
17918
 
4.1%
Other values (21) 120502
27.4%

situationalJudgement_11
Categorical

IMBALANCE  MISSING 

Distinct4
Distinct (%)0.2%
Missing25957
Missing (%)91.1%
Memory size1.4 MiB
2343 
 
107
 
59
 
28

Length

Max length196
Median length196
Mean length184.60741
Min length18

Characters and Unicode

Total characters468349
Distinct characters31
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
2343
 
8.2%
107
 
0.4%
59
 
0.2%
28
 
0.1%
(Missing) 25957
91.1%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
4987
 
5.6%
4714
 
5.3%
4714
 
5.3%
4686
 
5.3%
4686
 
5.3%
4686
 
5.3%
2478
 
2.8%
2478
 
2.8%
2450
 
2.8%
2343
 
2.7%
Other values (38) 50088
56.7%

Most occurring characters

ValueCountFrequency (%)
85880
18.3%
45583
 
9.7%
40784
 
8.7%
28529
 
6.1%
26614
 
5.7%
26313
 
5.6%
26122
 
5.6%
23759
 
5.1%
21385
 
4.6%
18907
 
4.0%
Other values (21) 124473
26.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 370366
79.1%
Space Separator 85880
 
18.3%
Other Punctuation 7223
 
1.5%
Uppercase Letter 4880
 
1.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
45583
12.3%
40784
11.0%
28529
 
7.7%
26614
 
7.2%
26313
 
7.1%
26122
 
7.1%
23759
 
6.4%
21385
 
5.8%
18907
 
5.1%
16429
 
4.4%
Other values (13) 95941
25.9%
Uppercase Letter
ValueCountFrequency (%)
2343
48.0%
2343
48.0%
107
 
2.2%
59
 
1.2%
28
 
0.6%
Other Punctuation
ValueCountFrequency (%)
4880
67.6%
2343
32.4%
Space Separator
ValueCountFrequency (%)
85880
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 375246
80.1%
Common 93103
 
19.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
45583
12.1%
40784
10.9%
28529
 
7.6%
26614
 
7.1%
26313
 
7.0%
26122
 
7.0%
23759
 
6.3%
21385
 
5.7%
18907
 
5.0%
16429
 
4.4%
Other values (18) 100821
26.9%
Common
ValueCountFrequency (%)
85880
92.2%
4880
 
5.2%
2343
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 468349
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
85880
18.3%
45583
 
9.7%
40784
 
8.7%
28529
 
6.1%
26614
 
5.7%
26313
 
5.6%
26122
 
5.6%
23759
 
5.1%
21385
 
4.6%
18907
 
4.0%
Other values (21) 124473
26.6%
Distinct4
Distinct (%)0.2%
Missing26433
Missing (%)92.8%
Memory size1.0 MiB
975 
891 
102 
 
93

Length

Max length97
Median length79
Mean length52.043668
Min length7

Characters and Unicode

Total characters107262
Distinct characters29
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
975
 
3.4%
891
 
3.1%
102
 
0.4%
93
 
0.3%
(Missing) 26433
92.8%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
2172
 
9.4%
1884
 
8.2%
1875
 
8.2%
1086
 
4.7%
975
 
4.2%
975
 
4.2%
975
 
4.2%
891
 
3.9%
891
 
3.9%
891
 
3.9%
Other values (24) 10386
45.2%

Most occurring characters

ValueCountFrequency (%)
20940
19.5%
13746
12.8%
11166
10.4%
10368
9.7%
6822
 
6.4%
3870
 
3.6%
3861
 
3.6%
3861
 
3.6%
3537
 
3.3%
3453
 
3.2%
Other values (19) 25638
23.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 82107
76.5%
Space Separator 20940
 
19.5%
Uppercase Letter 3036
 
2.8%
Other Punctuation 1179
 
1.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
13746
16.7%
11166
13.6%
10368
12.6%
6822
 
8.3%
3870
 
4.7%
3861
 
4.7%
3861
 
4.7%
3537
 
4.3%
3453
 
4.2%
3360
 
4.1%
Other values (12) 18063
22.0%
Uppercase Letter
ValueCountFrequency (%)
1077
35.5%
975
32.1%
891
29.3%
93
 
3.1%
Other Punctuation
ValueCountFrequency (%)
1086
92.1%
93
 
7.9%
Space Separator
ValueCountFrequency (%)
20940
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 85143
79.4%
Common 22119
 
20.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
13746
16.1%
11166
13.1%
10368
12.2%
6822
 
8.0%
3870
 
4.5%
3861
 
4.5%
3861
 
4.5%
3537
 
4.2%
3453
 
4.1%
3360
 
3.9%
Other values (16) 21099
24.8%
Common
ValueCountFrequency (%)
20940
94.7%
1086
 
4.9%
93
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 107262
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
20940
19.5%
13746
12.8%
11166
10.4%
10368
9.7%
6822
 
6.4%
3870
 
3.6%
3861
 
3.6%
3861
 
3.6%
3537
 
3.3%
3453
 
3.2%
Other values (19) 25638
23.9%

situationalJudgement_13
Categorical

IMBALANCE  MISSING 

Distinct4
Distinct (%)0.2%
Missing25968
Missing (%)91.1%
Memory size1.4 MiB
2480 
 
29
 
9
 
8

Length

Max length181
Median length181
Mean length178.72367
Min length18

Characters and Unicode

Total characters451456
Distinct characters30
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
2480
 
8.7%
29
 
0.1%
9
 
< 0.1%
8
 
< 0.1%
(Missing) 25968
91.1%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
7495
 
9.7%
4960
 
6.4%
4960
 
6.4%
4960
 
6.4%
4960
 
6.4%
2509
 
3.2%
2488
 
3.2%
2480
 
3.2%
2480
 
3.2%
2480
 
3.2%
Other values (33) 37629
48.6%

Most occurring characters

ValueCountFrequency (%)
74875
16.6%
49921
11.1%
32516
 
7.2%
32436
 
7.2%
27384
 
6.1%
27327
 
6.1%
24884
 
5.5%
24867
 
5.5%
20040
 
4.4%
19877
 
4.4%
Other values (20) 117329
26.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 359129
79.5%
Space Separator 74875
 
16.6%
Other Punctuation 12446
 
2.8%
Uppercase Letter 5006
 
1.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
49921
13.9%
32516
 
9.1%
32436
 
9.0%
27384
 
7.6%
27327
 
7.6%
24884
 
6.9%
24867
 
6.9%
20040
 
5.6%
19877
 
5.5%
17494
 
4.9%
Other values (11) 82383
22.9%
Uppercase Letter
ValueCountFrequency (%)
2480
49.5%
2480
49.5%
29
 
0.6%
9
 
0.2%
8
 
0.2%
Other Punctuation
ValueCountFrequency (%)
5006
40.2%
4960
39.9%
2480
19.9%
Space Separator
ValueCountFrequency (%)
74875
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 364135
80.7%
Common 87321
 
19.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
49921
13.7%
32516
 
8.9%
32436
 
8.9%
27384
 
7.5%
27327
 
7.5%
24884
 
6.8%
24867
 
6.8%
20040
 
5.5%
19877
 
5.5%
17494
 
4.8%
Other values (16) 87389
24.0%
Common
ValueCountFrequency (%)
74875
85.7%
5006
 
5.7%
4960
 
5.7%
2480
 
2.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 451456
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
74875
16.6%
49921
11.1%
32516
 
7.2%
32436
 
7.2%
27384
 
6.1%
27327
 
6.1%
24884
 
5.5%
24867
 
5.5%
20040
 
4.4%
19877
 
4.4%
Other values (20) 117329
26.0%

situationalJudgement_14
Categorical

IMBALANCE  MISSING 

Distinct4
Distinct (%)0.1%
Missing25826
Missing (%)90.6%
Memory size1.6 MiB
2619 
 
24
 
18
 
7

Length

Max length247
Median length247
Mean length243.15817
Min length7

Characters and Unicode

Total characters648746
Distinct characters30
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
2619
 
9.2%
24
 
0.1%
18
 
0.1%
7
 
< 0.1%
(Missing) 25826
90.6%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
13126
 
11.6%
5262
 
4.7%
5262
 
4.7%
5256
 
4.7%
5238
 
4.6%
2643
 
2.3%
2643
 
2.3%
2626
 
2.3%
2626
 
2.3%
2619
 
2.3%
Other values (37) 65700
58.1%

Most occurring characters

ValueCountFrequency (%)
115571
17.8%
68266
10.5%
57838
 
8.9%
44640
 
6.9%
44586
 
6.9%
36687
 
5.7%
34198
 
5.3%
34096
 
5.3%
31562
 
4.9%
21031
 
3.2%
Other values (20) 160271
24.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 519951
80.1%
Space Separator 115571
 
17.8%
Other Punctuation 7919
 
1.2%
Uppercase Letter 5305
 
0.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
68266
13.1%
57838
11.1%
44640
 
8.6%
44586
 
8.6%
36687
 
7.1%
34198
 
6.6%
34096
 
6.6%
31562
 
6.1%
21031
 
4.0%
21021
 
4.0%
Other values (12) 126026
24.2%
Uppercase Letter
ValueCountFrequency (%)
2637
49.7%
2619
49.4%
24
 
0.5%
18
 
0.3%
7
 
0.1%
Other Punctuation
ValueCountFrequency (%)
5269
66.5%
2650
33.5%
Space Separator
ValueCountFrequency (%)
115571
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 525256
81.0%
Common 123490
 
19.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
68266
13.0%
57838
11.0%
44640
 
8.5%
44586
 
8.5%
36687
 
7.0%
34198
 
6.5%
34096
 
6.5%
31562
 
6.0%
21031
 
4.0%
21021
 
4.0%
Other values (17) 131331
25.0%
Common
ValueCountFrequency (%)
115571
93.6%
5269
 
4.3%
2650
 
2.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 648746
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
115571
17.8%
68266
10.5%
57838
 
8.9%
44640
 
6.9%
44586
 
6.9%
36687
 
5.7%
34198
 
5.3%
34096
 
5.3%
31562
 
4.9%
21031
 
3.2%
Other values (20) 160271
24.7%

situationalJudgement_15
Categorical

IMBALANCE  MISSING 

Distinct4
Distinct (%)0.2%
Missing26387
Missing (%)92.6%
Memory size1.4 MiB
2069 
 
22
 
9
 
7

Length

Max length227
Median length227
Mean length225.04746
Min length18

Characters and Unicode

Total characters474175
Distinct characters33
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
2069
 
7.3%
22
 
0.1%
9
 
< 0.1%
7
 
< 0.1%
(Missing) 26387
92.6%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
16599
20.9%
6216
 
7.8%
4147
 
5.2%
4147
 
5.2%
2091
 
2.6%
2091
 
2.6%
2091
 
2.6%
2091
 
2.6%
2078
 
2.6%
2069
 
2.6%
Other values (51) 35835
45.1%

Most occurring characters

ValueCountFrequency (%)
79417
16.7%
52224
11.0%
45949
 
9.7%
29128
 
6.1%
27087
 
5.7%
25176
 
5.3%
24983
 
5.3%
20980
 
4.4%
20878
 
4.4%
20779
 
4.4%
Other values (23) 127574
26.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 380111
80.2%
Space Separator 79417
 
16.7%
Other Punctuation 8380
 
1.8%
Uppercase Letter 6267
 
1.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
52224
13.7%
45949
12.1%
29128
 
7.7%
27087
 
7.1%
25176
 
6.6%
24983
 
6.6%
20980
 
5.5%
20878
 
5.5%
20779
 
5.5%
16738
 
4.4%
Other values (13) 96189
25.3%
Uppercase Letter
ValueCountFrequency (%)
2091
33.4%
2069
33.0%
2069
33.0%
22
 
0.4%
9
 
0.1%
7
 
0.1%
Other Punctuation
ValueCountFrequency (%)
6267
74.8%
2091
 
25.0%
22
 
0.3%
Space Separator
ValueCountFrequency (%)
79417
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 386378
81.5%
Common 87797
 
18.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
52224
13.5%
45949
11.9%
29128
 
7.5%
27087
 
7.0%
25176
 
6.5%
24983
 
6.5%
20980
 
5.4%
20878
 
5.4%
20779
 
5.4%
16738
 
4.3%
Other values (19) 102456
26.5%
Common
ValueCountFrequency (%)
79417
90.5%
6267
 
7.1%
2091
 
2.4%
22
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 474175
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
79417
16.7%
52224
11.0%
45949
 
9.7%
29128
 
6.1%
27087
 
5.7%
25176
 
5.3%
24983
 
5.3%
20980
 
4.4%
20878
 
4.4%
20779
 
4.4%
Other values (23) 127574
26.9%

situationalJudgement_2
Categorical

IMBALANCE  MISSING 

Distinct4
Distinct (%)0.2%
Missing26363
Missing (%)92.5%
Memory size1.2 MiB
2028 
 
67
 
19
 
17

Length

Max length123
Median length123
Mean length118.71422
Min length7

Characters and Unicode

Total characters252980
Distinct characters31
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
2028
 
7.1%
67
 
0.2%
19
 
0.1%
17
 
0.1%
(Missing) 26363
92.5%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
4142
 
9.6%
4126
 
9.5%
4056
 
9.4%
2114
 
4.9%
2045
 
4.7%
2045
 
4.7%
2028
 
4.7%
2028
 
4.7%
2028
 
4.7%
2028
 
4.7%
Other values (32) 16725
38.6%

Most occurring characters

ValueCountFrequency (%)
41234
16.3%
26724
10.6%
20388
 
8.1%
16523
 
6.5%
14428
 
5.7%
14393
 
5.7%
14337
 
5.7%
14266
 
5.6%
12380
 
4.9%
12364
 
4.9%
Other values (21) 65943
26.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 207467
82.0%
Space Separator 41234
 
16.3%
Uppercase Letter 2198
 
0.9%
Other Punctuation 2081
 
0.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
26724
12.9%
20388
9.8%
16523
 
8.0%
14428
 
7.0%
14393
 
6.9%
14337
 
6.9%
14266
 
6.9%
12380
 
6.0%
12364
 
6.0%
12312
 
5.9%
Other values (14) 49352
23.8%
Uppercase Letter
ValueCountFrequency (%)
2095
95.3%
67
 
3.0%
19
 
0.9%
17
 
0.8%
Other Punctuation
ValueCountFrequency (%)
2064
99.2%
17
 
0.8%
Space Separator
ValueCountFrequency (%)
41234
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 209665
82.9%
Common 43315
 
17.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
26724
12.7%
20388
 
9.7%
16523
 
7.9%
14428
 
6.9%
14393
 
6.9%
14337
 
6.8%
14266
 
6.8%
12380
 
5.9%
12364
 
5.9%
12312
 
5.9%
Other values (18) 51550
24.6%
Common
ValueCountFrequency (%)
41234
95.2%
2064
 
4.8%
17
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 252980
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
41234
16.3%
26724
10.6%
20388
 
8.1%
16523
 
6.5%
14428
 
5.7%
14393
 
5.7%
14337
 
5.7%
14266
 
5.6%
12380
 
4.9%
12364
 
4.9%
Other values (21) 65943
26.1%
Distinct4
Distinct (%)0.2%
Missing26420
Missing (%)92.7%
Memory size1.1 MiB
1508 
401 
 
135
 
30

Length

Max length129
Median length110
Mean length89.832208
Min length7

Characters and Unicode

Total characters186312
Distinct characters31
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
1508
 
5.3%
401
 
1.4%
135
 
0.5%
30
 
0.1%
(Missing) 26420
92.7%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
3016
 
8.7%
1913
 
5.5%
1778
 
5.1%
1643
 
4.8%
1643
 
4.8%
1508
 
4.4%
1508
 
4.4%
1508
 
4.4%
1508
 
4.4%
1508
 
4.4%
Other values (26) 17025
49.3%

Most occurring characters

ValueCountFrequency (%)
32619
17.5%
19851
 
10.7%
18613
 
10.0%
13009
 
7.0%
10829
 
5.8%
10424
 
5.6%
8620
 
4.6%
8350
 
4.5%
6842
 
3.7%
6707
 
3.6%
Other values (21) 50448
27.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 147902
79.4%
Space Separator 32619
 
17.5%
Other Punctuation 3286
 
1.8%
Uppercase Letter 2505
 
1.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
19851
13.4%
18613
12.6%
13009
 
8.8%
10829
 
7.3%
10424
 
7.0%
8620
 
5.8%
8350
 
5.6%
6842
 
4.6%
6707
 
4.5%
6572
 
4.4%
Other values (14) 38085
25.8%
Other Punctuation
ValueCountFrequency (%)
1643
50.0%
1508
45.9%
135
 
4.1%
Uppercase Letter
ValueCountFrequency (%)
1538
61.4%
566
 
22.6%
401
 
16.0%
Space Separator
ValueCountFrequency (%)
32619
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 150407
80.7%
Common 35905
 
19.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
19851
13.2%
18613
12.4%
13009
 
8.6%
10829
 
7.2%
10424
 
6.9%
8620
 
5.7%
8350
 
5.6%
6842
 
4.5%
6707
 
4.5%
6572
 
4.4%
Other values (17) 40590
27.0%
Common
ValueCountFrequency (%)
32619
90.8%
1643
 
4.6%
1508
 
4.2%
135
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 186312
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
32619
17.5%
19851
 
10.7%
18613
 
10.0%
13009
 
7.0%
10829
 
5.8%
10424
 
5.6%
8620
 
4.6%
8350
 
4.5%
6842
 
3.7%
6707
 
3.6%
Other values (21) 50448
27.1%

situationalJudgement_4
Categorical

IMBALANCE  MISSING 

Distinct4
Distinct (%)0.2%
Missing26402
Missing (%)92.7%
Memory size1.2 MiB
2016 
 
37
 
22
 
17

Length

Max length139
Median length139
Mean length136.46558
Min length7

Characters and Unicode

Total characters285486
Distinct characters32
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
2016
 
7.1%
37
 
0.1%
22
 
0.1%
17
 
0.1%
(Missing) 26402
92.7%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
6107
 
11.9%
4069
 
7.9%
4032
 
7.8%
2075
 
4.0%
2070
 
4.0%
2053
 
4.0%
2016
 
3.9%
2016
 
3.9%
2016
 
3.9%
2016
 
3.9%
Other values (33) 22927
44.6%

Most occurring characters

ValueCountFrequency (%)
49342
17.3%
30580
10.7%
28586
10.0%
22605
 
7.9%
18479
 
6.5%
18395
 
6.4%
14395
 
5.0%
14267
 
5.0%
14245
 
5.0%
10250
 
3.6%
Other values (22) 64342
22.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 225838
79.1%
Space Separator 49342
 
17.3%
Other Punctuation 6107
 
2.1%
Uppercase Letter 4125
 
1.4%
Decimal Number 74
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
30580
13.5%
28586
12.7%
22605
10.0%
18479
8.2%
18395
8.1%
14395
 
6.4%
14267
 
6.3%
14245
 
6.3%
10250
 
4.5%
10208
 
4.5%
Other values (12) 43828
19.4%
Uppercase Letter
ValueCountFrequency (%)
2038
49.4%
2016
48.9%
37
 
0.9%
17
 
0.4%
17
 
0.4%
Other Punctuation
ValueCountFrequency (%)
4091
67.0%
2016
33.0%
Decimal Number
ValueCountFrequency (%)
37
50.0%
37
50.0%
Space Separator
ValueCountFrequency (%)
49342
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 229963
80.6%
Common 55523
 
19.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
30580
13.3%
28586
12.4%
22605
9.8%
18479
 
8.0%
18395
 
8.0%
14395
 
6.3%
14267
 
6.2%
14245
 
6.2%
10250
 
4.5%
10208
 
4.4%
Other values (17) 47953
20.9%
Common
ValueCountFrequency (%)
49342
88.9%
4091
 
7.4%
2016
 
3.6%
37
 
0.1%
37
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 285486
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
49342
17.3%
30580
10.7%
28586
10.0%
22605
 
7.9%
18479
 
6.5%
18395
 
6.4%
14395
 
5.0%
14267
 
5.0%
14245
 
5.0%
10250
 
3.6%
Other values (22) 64342
22.5%

situationalJudgement_5
Categorical

IMBALANCE  MISSING 

Distinct4
Distinct (%)0.2%
Missing26382
Missing (%)92.6%
Memory size1.2 MiB
1676 
366 
 
60
 
10

Length

Max length153
Median length153
Mean length139.75284
Min length18

Characters and Unicode

Total characters295158
Distinct characters30
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
1676
 
5.9%
366
 
1.3%
60
 
0.2%
10
 
< 0.1%
(Missing) 26382
92.6%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
5780
 
10.3%
3788
 
6.8%
3718
 
6.6%
3352
 
6.0%
2408
 
4.3%
2052
 
3.7%
2042
 
3.6%
2042
 
3.6%
2042
 
3.6%
2042
 
3.6%
Other values (36) 26722
47.7%

Most occurring characters

ValueCountFrequency (%)
54618
18.5%
33844
11.5%
30340
10.3%
25392
 
8.6%
17868
 
6.1%
15002
 
5.1%
13682
 
4.6%
11926
 
4.0%
11550
 
3.9%
10330
 
3.5%
Other values (20) 70606
23.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 226096
76.6%
Space Separator 54618
 
18.5%
Other Punctuation 9924
 
3.4%
Uppercase Letter 4520
 
1.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
33844
15.0%
30340
13.4%
25392
11.2%
17868
 
7.9%
15002
 
6.6%
13682
 
6.1%
11926
 
5.3%
11550
 
5.1%
10330
 
4.6%
9884
 
4.4%
Other values (12) 46278
20.5%
Other Punctuation
ValueCountFrequency (%)
3718
37.5%
2408
24.3%
2112
21.3%
1686
17.0%
Uppercase Letter
ValueCountFrequency (%)
2418
53.5%
2042
45.2%
60
 
1.3%
Space Separator
ValueCountFrequency (%)
54618
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 230616
78.1%
Common 64542
 
21.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
33844
14.7%
30340
13.2%
25392
11.0%
17868
 
7.7%
15002
 
6.5%
13682
 
5.9%
11926
 
5.2%
11550
 
5.0%
10330
 
4.5%
9884
 
4.3%
Other values (15) 50798
22.0%
Common
ValueCountFrequency (%)
54618
84.6%
3718
 
5.8%
2408
 
3.7%
2112
 
3.3%
1686
 
2.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 295158
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
54618
18.5%
33844
11.5%
30340
10.3%
25392
 
8.6%
17868
 
6.1%
15002
 
5.1%
13682
 
4.6%
11926
 
4.0%
11550
 
3.9%
10330
 
3.5%
Other values (20) 70606
23.9%
Distinct5
Distinct (%)0.2%
Missing26395
Missing (%)92.6%
Memory size1.2 MiB
1154 
705 
227 
 
12
 
1

Length

Max length201
Median length201
Mean length162.32206
Min length7

Characters and Unicode

Total characters340714
Distinct characters31
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Common Values

ValueCountFrequency (%)
1154
 
4.0%
705
 
2.5%
227
 
0.8%
12
 
< 0.1%
1
 
< 0.1%
(Missing) 26395
92.6%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
10936
19.8%
3945
 
7.2%
3731
 
6.8%
3026
 
5.5%
2308
 
4.2%
1859
 
3.4%
1859
 
3.4%
1410
 
2.6%
1167
 
2.1%
1154
 
2.1%
Other values (30) 23700
43.0%

Most occurring characters

ValueCountFrequency (%)
53009
15.6%
46440
13.6%
28861
 
8.5%
20895
 
6.1%
20219
 
5.9%
19078
 
5.6%
18848
 
5.5%
18755
 
5.5%
16514
 
4.8%
14906
 
4.4%
Other values (21) 83189
24.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 279032
81.9%
Space Separator 53009
 
15.6%
Other Punctuation 4436
 
1.3%
Uppercase Letter 4237
 
1.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
46440
16.6%
28861
10.3%
20895
 
7.5%
20219
 
7.2%
19078
 
6.8%
18848
 
6.8%
18755
 
6.7%
16514
 
5.9%
14906
 
5.3%
13501
 
4.8%
Other values (13) 61015
21.9%
Uppercase Letter
ValueCountFrequency (%)
2603
61.4%
1167
27.5%
227
 
5.4%
227
 
5.4%
13
 
0.3%
Other Punctuation
ValueCountFrequency (%)
3731
84.1%
705
 
15.9%
Space Separator
ValueCountFrequency (%)
53009
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 283269
83.1%
Common 57445
 
16.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
46440
16.4%
28861
10.2%
20895
 
7.4%
20219
 
7.1%
19078
 
6.7%
18848
 
6.7%
18755
 
6.6%
16514
 
5.8%
14906
 
5.3%
13501
 
4.8%
Other values (18) 65252
23.0%
Common
ValueCountFrequency (%)
53009
92.3%
3731
 
6.5%
705
 
1.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 340714
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
53009
15.6%
46440
13.6%
28861
 
8.5%
20895
 
6.1%
20219
 
5.9%
19078
 
5.6%
18848
 
5.5%
18755
 
5.5%
16514
 
4.8%
14906
 
4.4%
Other values (21) 83189
24.4%

situationalJudgement_7
Categorical

IMBALANCE  MISSING 

Distinct4
Distinct (%)0.2%
Missing26382
Missing (%)92.6%
Memory size1.5 MiB
1914 
 
157
 
21
 
20

Length

Max length280
Median length280
Mean length264.79261
Min length29

Characters and Unicode

Total characters559242
Distinct characters32
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
1914
 
6.7%
157
 
0.6%
21
 
0.1%
20
 
0.1%
(Missing) 26382
92.6%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
9988
 
10.6%
5742
 
6.1%
4027
 
4.3%
4006
 
4.3%
2249
 
2.4%
2092
 
2.2%
2092
 
2.2%
2071
 
2.2%
2071
 
2.2%
2071
 
2.2%
Other values (55) 57764
61.3%

Most occurring characters

ValueCountFrequency (%)
94132
16.8%
64481
11.5%
49911
 
8.9%
35026
 
6.3%
34391
 
6.1%
29683
 
5.3%
29619
 
5.3%
27800
 
5.0%
24272
 
4.3%
22142
 
4.0%
Other values (22) 147785
26.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 449245
80.3%
Space Separator 94132
 
16.8%
Other Punctuation 8011
 
1.4%
Uppercase Letter 7854
 
1.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
64481
14.4%
49911
11.1%
35026
 
7.8%
34391
 
7.7%
29683
 
6.6%
29619
 
6.6%
27800
 
6.2%
24272
 
5.4%
22142
 
4.9%
19831
 
4.4%
Other values (14) 112089
25.0%
Uppercase Letter
ValueCountFrequency (%)
3985
50.7%
1914
24.4%
1914
24.4%
21
 
0.3%
20
 
0.3%
Other Punctuation
ValueCountFrequency (%)
5940
74.1%
2071
 
25.9%
Space Separator
ValueCountFrequency (%)
94132
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 457099
81.7%
Common 102143
 
18.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
64481
14.1%
49911
10.9%
35026
 
7.7%
34391
 
7.5%
29683
 
6.5%
29619
 
6.5%
27800
 
6.1%
24272
 
5.3%
22142
 
4.8%
19831
 
4.3%
Other values (19) 119943
26.2%
Common
ValueCountFrequency (%)
94132
92.2%
5940
 
5.8%
2071
 
2.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 559242
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
94132
16.8%
64481
11.5%
49911
 
8.9%
35026
 
6.3%
34391
 
6.1%
29683
 
5.3%
29619
 
5.3%
27800
 
5.0%
24272
 
4.3%
22142
 
4.0%
Other values (22) 147785
26.4%
Distinct4
Distinct (%)0.2%
Missing26406
Missing (%)92.7%
Memory size1.1 MiB
935 
548 
403 
202 

Length

Max length130
Median length98
Mean length76.713602
Min length7

Characters and Unicode

Total characters160178
Distinct characters30
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
935
 
3.3%
548
 
1.9%
403
 
1.4%
202
 
0.7%
(Missing) 26406
92.7%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
2878
 
9.4%
1943
 
6.3%
1886
 
6.1%
1741
 
5.7%
1556
 
5.1%
1338
 
4.3%
1338
 
4.3%
1338
 
4.3%
1338
 
4.3%
1338
 
4.3%
Other values (26) 14077
45.7%

Most occurring characters

ValueCountFrequency (%)
28683
17.9%
17082
 
10.7%
16736
 
10.4%
9457
 
5.9%
8836
 
5.5%
7771
 
4.9%
7699
 
4.8%
6691
 
4.2%
6233
 
3.9%
4966
 
3.1%
Other values (20) 46024
28.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 124514
77.7%
Space Separator 28683
 
17.9%
Other Punctuation 4345
 
2.7%
Uppercase Letter 2636
 
1.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
17082
13.7%
16736
13.4%
9457
 
7.6%
8836
 
7.1%
7771
 
6.2%
7699
 
6.2%
6691
 
5.4%
6233
 
5.0%
4966
 
4.0%
4893
 
3.9%
Other values (13) 34150
27.4%
Other Punctuation
ValueCountFrequency (%)
1870
43.0%
1540
35.4%
935
21.5%
Uppercase Letter
ValueCountFrequency (%)
1483
56.3%
605
23.0%
548
 
20.8%
Space Separator
ValueCountFrequency (%)
28683
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 127150
79.4%
Common 33028
 
20.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
17082
13.4%
16736
13.2%
9457
 
7.4%
8836
 
6.9%
7771
 
6.1%
7699
 
6.1%
6691
 
5.3%
6233
 
4.9%
4966
 
3.9%
4893
 
3.8%
Other values (16) 36786
28.9%
Common
ValueCountFrequency (%)
28683
86.8%
1870
 
5.7%
1540
 
4.7%
935
 
2.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 160178
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
28683
17.9%
17082
 
10.7%
16736
 
10.4%
9457
 
5.9%
8836
 
5.5%
7771
 
4.9%
7699
 
4.8%
6691
 
4.2%
6233
 
3.9%
4966
 
3.1%
Other values (20) 46024
28.7%

situationalJudgement_9
Categorical

IMBALANCE  MISSING 

Distinct4
Distinct (%)0.1%
Missing25820
Missing (%)90.6%
Memory size1.6 MiB
2521 
 
106
 
26
 
21

Length

Max length249
Median length249
Mean length242.35453
Min length37

Characters and Unicode

Total characters648056
Distinct characters36
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
2521
 
8.8%
106
 
0.4%
26
 
0.1%
21
 
0.1%
(Missing) 25820
90.6%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
10343
 
9.9%
7563
 
7.2%
5195
 
5.0%
5068
 
4.9%
2759
 
2.6%
2653
 
2.5%
2653
 
2.5%
2627
 
2.5%
2627
 
2.5%
2627
 
2.5%
Other values (39) 60369
57.8%

Most occurring characters

ValueCountFrequency (%)
104331
16.1%
62075
 
9.6%
54543
 
8.4%
44567
 
6.9%
44070
 
6.8%
36547
 
5.6%
36187
 
5.6%
31093
 
4.8%
28524
 
4.4%
28414
 
4.4%
Other values (26) 177705
27.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 522615
80.6%
Space Separator 104331
 
16.1%
Other Punctuation 12970
 
2.0%
Uppercase Letter 7822
 
1.2%
Decimal Number 212
 
< 0.1%
Currency Symbol 106
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
62075
11.9%
54543
10.4%
44567
 
8.5%
44070
 
8.4%
36547
 
7.0%
36187
 
6.9%
31093
 
5.9%
28524
 
5.5%
28414
 
5.4%
23414
 
4.5%
Other values (14) 133181
25.5%
Uppercase Letter
ValueCountFrequency (%)
2547
32.6%
2521
32.2%
2521
32.2%
127
 
1.6%
106
 
1.4%
Other Punctuation
ValueCountFrequency (%)
7822
60.3%
2627
 
20.3%
2521
 
19.4%
Decimal Number
ValueCountFrequency (%)
106
50.0%
106
50.0%
Space Separator
ValueCountFrequency (%)
104331
100.0%
Currency Symbol
ValueCountFrequency (%)
106
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 530437
81.9%
Common 117619
 
18.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
62075
11.7%
54543
 
10.3%
44567
 
8.4%
44070
 
8.3%
36547
 
6.9%
36187
 
6.8%
31093
 
5.9%
28524
 
5.4%
28414
 
5.4%
23414
 
4.4%
Other values (19) 141003
26.6%
Common
ValueCountFrequency (%)
104331
88.7%
7822
 
6.7%
2627
 
2.2%
2521
 
2.1%
106
 
0.1%
106
 
0.1%
106
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 648056
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
104331
16.1%
62075
 
9.6%
54543
 
8.4%
44567
 
6.9%
44070
 
6.8%
36547
 
5.6%
36187
 
5.6%
31093
 
4.8%
28524
 
4.4%
28414
 
4.4%
Other values (26) 177705
27.4%

total
Real number (ℝ)

Distinct16
Distinct (%)0.1%
Missing11229
Missing (%)39.4%
Infinite0
Infinite (%)0.0%
Mean15.286476
Minimum0
Maximum33
Zeros3
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size222.7 KiB

Quantile statistics

Minimum0
5-th percentile10
Q111
median13
Q315
95-th percentile33
Maximum33
Range33
Interquartile range (IQR)4

Descriptive statistics

Standard deviation7.0989102
Coefficient of variation (CV)0.46439156
Kurtosis2.1940269
Mean15.286476
Median Absolute Deviation (MAD)2
Skewness1.9351038
Sum263921
Variance50.394526
MonotonicityNot monotonic
Histogram with fixed size bins (bins=16)
ValueCountFrequency (%)
11 3778
 
13.3%
14 3432
 
12.0%
15 2740
 
9.6%
33 2276
 
8.0%
12 2117
 
7.4%
10 1552
 
5.4%
13 1262
 
4.4%
9 47
 
0.2%
8 17
 
0.1%
7 17
 
0.1%
Other values (6) 27
 
0.1%
(Missing) 11229
39.4%
ValueCountFrequency (%)
0 3
 
< 0.1%
1 6
 
< 0.1%
2 1
 
< 0.1%
4 2
 
< 0.1%
5 7
 
< 0.1%
6 8
 
< 0.1%
7 17
 
0.1%
8 17
 
0.1%
9 47
 
0.2%
10 1552
5.4%
ValueCountFrequency (%)
33 2276
8.0%
15 2740
9.6%
14 3432
12.0%
13 1262
 
4.4%
12 2117
7.4%
11 3778
13.3%
10 1552
5.4%
9 47
 
0.2%
8 17
 
0.1%
7 17
 
0.1%

trial
Categorical

HIGH CARDINALITY  MISSING  UNIFORM 

Distinct2810
Distinct (%)91.3%
Missing25415
Missing (%)89.2%
Memory size1.3 MiB
 
11
 
10
 
7
 
6
 
6
3039 

Length

Max length427
Median length382
Mean length137.1062
Min length51

Characters and Unicode

Total characters422150
Distinct characters33
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2636 ?
Unique (%)85.6%

Common Values

ValueCountFrequency (%)
11
 
< 0.1%
10
 
< 0.1%
7
 
< 0.1%
6
 
< 0.1%
6
 
< 0.1%
6
 
< 0.1%
6
 
< 0.1%
5
 
< 0.1%
5
 
< 0.1%
5
 
< 0.1%
Other values (2800) 3012
 
10.6%
(Missing) 25415
89.2%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
11
 
0.4%
10
 
0.3%
7
 
0.2%
6
 
0.2%
6
 
0.2%
6
 
0.2%
6
 
0.2%
5
 
0.2%
5
 
0.2%
5
 
0.2%
Other values (2800) 3012
97.8%

Most occurring characters

ValueCountFrequency (%)
93828
22.2%
31276
 
7.4%
29951
 
7.1%
25019
 
5.9%
23457
 
5.6%
23457
 
5.6%
20378
 
4.8%
15638
 
3.7%
15638
 
3.7%
15638
 
3.7%
Other values (23) 127870
30.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 181560
43.0%
Other Punctuation 153301
36.3%
Decimal Number 65493
 
15.5%
Open Punctuation 10898
 
2.6%
Close Punctuation 10898
 
2.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
31276
17.2%
29951
16.5%
23457
12.9%
15638
8.6%
15638
8.6%
12590
6.9%
12590
6.9%
7819
 
4.3%
7819
 
4.3%
7819
 
4.3%
Other values (5) 16963
9.3%
Decimal Number
ValueCountFrequency (%)
25019
38.2%
6322
 
9.7%
6105
 
9.3%
5707
 
8.7%
4729
 
7.2%
4662
 
7.1%
4396
 
6.7%
3824
 
5.8%
2466
 
3.8%
2263
 
3.5%
Other Punctuation
ValueCountFrequency (%)
93828
61.2%
23457
 
15.3%
20378
 
13.3%
15638
 
10.2%
Open Punctuation
ValueCountFrequency (%)
7819
71.7%
3079
 
28.3%
Close Punctuation
ValueCountFrequency (%)
7819
71.7%
3079
 
28.3%

Most occurring scripts

ValueCountFrequency (%)
Common 240590
57.0%
Latin 181560
43.0%

Most frequent character per script

Common
ValueCountFrequency (%)
93828
39.0%
25019
 
10.4%
23457
 
9.7%
20378
 
8.5%
15638
 
6.5%
7819
 
3.2%
7819
 
3.2%
6322
 
2.6%
6105
 
2.5%
5707
 
2.4%
Other values (8) 28498
 
11.8%
Latin
ValueCountFrequency (%)
31276
17.2%
29951
16.5%
23457
12.9%
15638
8.6%
15638
8.6%
12590
6.9%
12590
6.9%
7819
 
4.3%
7819
 
4.3%
7819
 
4.3%
Other values (5) 16963
9.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 422150
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
93828
22.2%
31276
 
7.4%
29951
 
7.1%
25019
 
5.9%
23457
 
5.6%
23457
 
5.6%
20378
 
4.8%
15638
 
3.7%
15638
 
3.7%
15638
 
3.7%
Other values (23) 127870
30.3%

type
Categorical

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.9 MiB
3797 
3750 
3535 
3282 
3079 
11051 

Length

Max length38
Median length17
Mean length12.326525
Min length5

Characters and Unicode

Total characters351232
Distinct characters36
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
3797
13.3%
3750
13.2%
3535
12.4%
3282
11.5%
3079
10.8%
2741
9.6%
2276
8.0%
2140
7.5%
1812
6.4%
1168
 
4.1%
Other values (2) 914
 
3.2%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
3797
13.3%
3750
13.2%
3535
12.4%
3282
11.5%
3079
10.8%
2741
9.6%
2276
8.0%
2140
7.5%
1812
6.4%
1168
 
4.1%
Other values (2) 914
 
3.2%

Most occurring characters

ValueCountFrequency (%)
39570
11.3%
37958
10.8%
36463
10.4%
30219
 
8.6%
21969
 
6.3%
21714
 
6.2%
21126
 
6.0%
20722
 
5.9%
19178
 
5.5%
15255
 
4.3%
Other values (26) 87058
24.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 335755
95.6%
Uppercase Letter 11478
 
3.3%
Dash Punctuation 3991
 
1.1%
Decimal Number 6
 
< 0.1%
Connector Punctuation 2
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
39570
11.8%
37958
11.3%
36463
10.9%
30219
9.0%
21969
 
6.5%
21714
 
6.5%
21126
 
6.3%
20722
 
6.2%
19178
 
5.7%
15255
 
4.5%
Other values (12) 71581
21.3%
Uppercase Letter
ValueCountFrequency (%)
4918
42.8%
3797
33.1%
2743
23.9%
4
 
< 0.1%
4
 
< 0.1%
4
 
< 0.1%
4
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
2
33.3%
2
33.3%
2
33.3%
Dash Punctuation
ValueCountFrequency (%)
3991
100.0%
Connector Punctuation
ValueCountFrequency (%)
2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 347233
98.9%
Common 3999
 
1.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
39570
11.4%
37958
10.9%
36463
10.5%
30219
 
8.7%
21969
 
6.3%
21714
 
6.3%
21126
 
6.1%
20722
 
6.0%
19178
 
5.5%
15255
 
4.4%
Other values (21) 83059
23.9%
Common
ValueCountFrequency (%)
3991
99.8%
2
 
0.1%
2
 
0.1%
2
 
0.1%
2
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 351232
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
39570
11.3%
37958
10.8%
36463
10.4%
30219
 
8.6%
21969
 
6.3%
21714
 
6.2%
21126
 
6.0%
20722
 
5.9%
19178
 
5.5%
15255
 
4.3%
Other values (26) 87058
24.8%

user
Categorical

HIGH CARDINALITY  UNIFORM 

Distinct3835
Distinct (%)13.5%
Missing17
Missing (%)0.1%
Memory size2.0 MiB
 
19
 
17
 
16
 
14
 
13
28398 

Length

Max length17
Median length17
Mean length17
Min length17

Characters and Unicode

Total characters484109
Distinct characters55
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique68 ?
Unique (%)0.2%

Common Values

ValueCountFrequency (%)
19
 
0.1%
17
 
0.1%
16
 
0.1%
14
 
< 0.1%
13
 
< 0.1%
11
 
< 0.1%
11
 
< 0.1%
11
 
< 0.1%
11
 
< 0.1%
11
 
< 0.1%
Other values (3825) 28343
99.5%
(Missing) 17
 
0.1%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
19
 
0.1%
17
 
0.1%
16
 
0.1%
14
 
< 0.1%
13
 
< 0.1%
11
 
< 0.1%
11
 
< 0.1%
11
 
< 0.1%
11
 
< 0.1%
11
 
< 0.1%
Other values (3825) 28343
99.5%

Most occurring characters

ValueCountFrequency (%)
9339
 
1.9%
9222
 
1.9%
9219
 
1.9%
9207
 
1.9%
9199
 
1.9%
9182
 
1.9%
9147
 
1.9%
9076
 
1.9%
9051
 
1.9%
9041
 
1.9%
Other values (45) 392426
81.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 220526
45.6%
Uppercase Letter 193122
39.9%
Decimal Number 70461
 
14.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
9339
 
4.2%
9222
 
4.2%
9199
 
4.2%
9182
 
4.2%
9147
 
4.1%
9041
 
4.1%
9022
 
4.1%
8993
 
4.1%
8981
 
4.1%
8880
 
4.0%
Other values (15) 129520
58.7%
Uppercase Letter
ValueCountFrequency (%)
9207
 
4.8%
9076
 
4.7%
9051
 
4.7%
8988
 
4.7%
8899
 
4.6%
8873
 
4.6%
8852
 
4.6%
8839
 
4.6%
8790
 
4.6%
8762
 
4.5%
Other values (12) 103785
53.7%
Decimal Number
ValueCountFrequency (%)
9219
13.1%
9001
12.8%
8849
12.6%
8848
12.6%
8765
12.4%
8710
12.4%
8699
12.3%
8370
11.9%

Most occurring scripts

ValueCountFrequency (%)
Latin 413648
85.4%
Common 70461
 
14.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
9339
 
2.3%
9222
 
2.2%
9207
 
2.2%
9199
 
2.2%
9182
 
2.2%
9147
 
2.2%
9076
 
2.2%
9051
 
2.2%
9041
 
2.2%
9022
 
2.2%
Other values (37) 322162
77.9%
Common
ValueCountFrequency (%)
9219
13.1%
9001
12.8%
8849
12.6%
8848
12.6%
8765
12.4%
8710
12.4%
8699
12.3%
8370
11.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 484109
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
9339
 
1.9%
9222
 
1.9%
9219
 
1.9%
9207
 
1.9%
9199
 
1.9%
9182
 
1.9%
9147
 
1.9%
9076
 
1.9%
9051
 
1.9%
9041
 
1.9%
Other values (45) 392426
81.1%

voice_1.GCSData
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing28494
Missing (%)100.0%
Memory size222.7 KiB
Distinct3121
Distinct (%)99.9%
Missing25369
Missing (%)89.0%
Memory size1.1 MiB
 
2
 
2
 
2
 
2
 
1
3116 
25369 
ValueCountFrequency (%)
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
Other values (3111) 3111
 
10.9%
(Missing) 25369
89.0%
ValueCountFrequency (%)
3125
 
11.0%
(Missing) 25369
89.0%
ValueCountFrequency (%)
3125
 
11.0%
(Missing) 25369
89.0%
ValueCountFrequency (%)
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
Other values (3111) 3111
 
10.9%
(Missing) 25369
89.0%
ValueCountFrequency (%)
3125
 
11.0%
(Missing) 25369
89.0%
ValueCountFrequency (%)
3125
 
11.0%
(Missing) 25369
89.0%

voice_1.fileName
Categorical

HIGH CARDINALITY  MISSING  UNIFORM 

Distinct3139
Distinct (%)99.8%
Missing25350
Missing (%)89.0%
Memory size1.0 MiB
 
2
 
2
 
2
 
2
 
2
3134 

Length

Max length25
Median length25
Mean length25
Min length25

Characters and Unicode

Total characters78600
Distinct characters58
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3134 ?
Unique (%)99.7%

Common Values

ValueCountFrequency (%)
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
Other values (3129) 3129
 
11.0%
(Missing) 25350
89.0%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
2
 
0.1%
2
 
0.1%
2
 
0.1%
2
 
0.1%
2
 
0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
Other values (3129) 3129
99.5%

Most occurring characters

ValueCountFrequency (%)
6288
 
8.0%
4153
 
5.3%
4134
 
5.3%
4107
 
5.2%
4069
 
5.2%
4064
 
5.2%
2646
 
3.4%
2221
 
2.8%
1225
 
1.6%
1065
 
1.4%
Other values (48) 44628
56.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 40010
50.9%
Uppercase Letter 21446
27.3%
Decimal Number 10856
 
13.8%
Connector Punctuation 6288
 
8.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
4153
 
10.4%
4134
 
10.3%
4107
 
10.3%
4069
 
10.2%
4064
 
10.2%
1017
 
2.5%
1016
 
2.5%
1009
 
2.5%
1003
 
2.5%
998
 
2.5%
Other values (15) 14440
36.1%
Uppercase Letter
ValueCountFrequency (%)
1065
 
5.0%
1033
 
4.8%
994
 
4.6%
993
 
4.6%
992
 
4.6%
990
 
4.6%
984
 
4.6%
983
 
4.6%
983
 
4.6%
980
 
4.6%
Other values (12) 11449
53.4%
Decimal Number
ValueCountFrequency (%)
2646
24.4%
2221
20.5%
1225
11.3%
963
 
8.9%
956
 
8.8%
946
 
8.7%
945
 
8.7%
913
 
8.4%
37
 
0.3%
4
 
< 0.1%
Connector Punctuation
ValueCountFrequency (%)
6288
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 61456
78.2%
Common 17144
 
21.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
4153
 
6.8%
4134
 
6.7%
4107
 
6.7%
4069
 
6.6%
4064
 
6.6%
1065
 
1.7%
1033
 
1.7%
1017
 
1.7%
1016
 
1.7%
1009
 
1.6%
Other values (37) 35789
58.2%
Common
ValueCountFrequency (%)
6288
36.7%
2646
15.4%
2221
 
13.0%
1225
 
7.1%
963
 
5.6%
956
 
5.6%
946
 
5.5%
945
 
5.5%
913
 
5.3%
37
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 78600
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
6288
 
8.0%
4153
 
5.3%
4134
 
5.3%
4107
 
5.2%
4069
 
5.2%
4064
 
5.2%
2646
 
3.4%
2221
 
2.8%
1225
 
1.6%
1065
 
1.4%
Other values (48) 44628
56.8%

voice_1.prompt
Categorical

IMBALANCE  MISSING 

Distinct6
Distinct (%)0.2%
Missing25350
Missing (%)89.0%
Memory size1.9 MiB
1884 
1215 
 
35
 
5
 
3

Length

Max length322
Median length305
Mean length308.32824
Min length57

Characters and Unicode

Total characters969384
Distinct characters41
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
1884
 
6.6%
1215
 
4.3%
35
 
0.1%
5
 
< 0.1%
3
 
< 0.1%
2
 
< 0.1%
(Missing) 25350
89.0%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
14375
 
8.1%
13632
 
7.7%
10517
 
5.9%
6203
 
3.5%
3778
 
2.1%
3773
 
2.1%
3771
 
2.1%
3768
 
2.1%
3680
 
2.1%
3645
 
2.0%
Other values (109) 110730
62.3%

Most occurring characters

ValueCountFrequency (%)
174728
18.0%
81105
 
8.4%
78971
 
8.1%
74848
 
7.7%
59400
 
6.1%
52740
 
5.4%
43514
 
4.5%
43510
 
4.5%
38033
 
3.9%
37288
 
3.8%
Other values (31) 285247
29.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 758358
78.2%
Space Separator 174728
 
18.0%
Other Punctuation 17530
 
1.8%
Uppercase Letter 13116
 
1.4%
Decimal Number 5652
 
0.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
81105
 
10.7%
78971
 
10.4%
74848
 
9.9%
59400
 
7.8%
52740
 
7.0%
43514
 
5.7%
43510
 
5.7%
38033
 
5.0%
37288
 
4.9%
34958
 
4.6%
Other values (14) 213991
28.2%
Uppercase Letter
ValueCountFrequency (%)
5018
38.3%
3702
28.2%
1886
 
14.4%
1215
 
9.3%
1215
 
9.3%
73
 
0.6%
5
 
< 0.1%
2
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
10602
60.5%
3780
 
21.6%
1884
 
10.7%
1224
 
7.0%
40
 
0.2%
Decimal Number
ValueCountFrequency (%)
1884
33.3%
1884
33.3%
1884
33.3%
Space Separator
ValueCountFrequency (%)
174728
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 771474
79.6%
Common 197910
 
20.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
81105
 
10.5%
78971
 
10.2%
74848
 
9.7%
59400
 
7.7%
52740
 
6.8%
43514
 
5.6%
43510
 
5.6%
38033
 
4.9%
37288
 
4.8%
34958
 
4.5%
Other values (22) 227107
29.4%
Common
ValueCountFrequency (%)
174728
88.3%
10602
 
5.4%
3780
 
1.9%
1884
 
1.0%
1884
 
1.0%
1884
 
1.0%
1884
 
1.0%
1224
 
0.6%
40
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 969384
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
174728
18.0%
81105
 
8.4%
78971
 
8.1%
74848
 
7.7%
59400
 
6.1%
52740
 
5.4%
43514
 
4.5%
43510
 
4.5%
38033
 
3.9%
37288
 
3.8%
Other values (31) 285247
29.4%
Distinct3253
Distinct (%)99.9%
Missing25237
Missing (%)88.6%
Memory size1.1 MiB
 
2
 
2
 
2
 
2
 
1
3248 
25237 
ValueCountFrequency (%)
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
Other values (3243) 3243
 
11.4%
(Missing) 25237
88.6%
ValueCountFrequency (%)
3257
 
11.4%
(Missing) 25237
88.6%
ValueCountFrequency (%)
3257
 
11.4%
(Missing) 25237
88.6%
ValueCountFrequency (%)
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
Other values (3243) 3243
 
11.4%
(Missing) 25237
88.6%
ValueCountFrequency (%)
3257
 
11.4%
(Missing) 25237
88.6%
ValueCountFrequency (%)
3257
 
11.4%
(Missing) 25237
88.6%

voice_10.fileName
Categorical

HIGH CARDINALITY  MISSING  UNIFORM 

Distinct3277
Distinct (%)99.8%
Missing25212
Missing (%)88.5%
Memory size1.0 MiB
 
2
 
2
 
2
 
2
 
2
3272 

Length

Max length26
Median length26
Mean length25.884522
Min length25

Characters and Unicode

Total characters84953
Distinct characters58
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3272 ?
Unique (%)99.7%

Common Values

ValueCountFrequency (%)
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
Other values (3267) 3267
 
11.5%
(Missing) 25212
88.5%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
2
 
0.1%
2
 
0.1%
2
 
0.1%
2
 
0.1%
2
 
0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
Other values (3267) 3267
99.5%

Most occurring characters

ValueCountFrequency (%)
6564
 
7.7%
4347
 
5.1%
4313
 
5.1%
4283
 
5.0%
4246
 
5.0%
4244
 
5.0%
2903
 
3.4%
2903
 
3.4%
1307
 
1.5%
1110
 
1.3%
Other values (48) 48733
57.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 41758
49.2%
Uppercase Letter 22399
26.4%
Decimal Number 14232
 
16.8%
Connector Punctuation 6564
 
7.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
4347
 
10.4%
4313
 
10.3%
4283
 
10.3%
4246
 
10.2%
4244
 
10.2%
1081
 
2.6%
1058
 
2.5%
1054
 
2.5%
1054
 
2.5%
1038
 
2.5%
Other values (15) 15040
36.0%
Uppercase Letter
ValueCountFrequency (%)
1110
 
5.0%
1079
 
4.8%
1040
 
4.6%
1038
 
4.6%
1032
 
4.6%
1032
 
4.6%
1029
 
4.6%
1027
 
4.6%
1025
 
4.6%
1023
 
4.6%
Other values (12) 11964
53.4%
Decimal Number
ValueCountFrequency (%)
2903
20.4%
2903
20.4%
1307
9.2%
1059
 
7.4%
1048
 
7.4%
1014
 
7.1%
1007
 
7.1%
1002
 
7.0%
1000
 
7.0%
989
 
6.9%
Connector Punctuation
ValueCountFrequency (%)
6564
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 64157
75.5%
Common 20796
 
24.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
4347
 
6.8%
4313
 
6.7%
4283
 
6.7%
4246
 
6.6%
4244
 
6.6%
1110
 
1.7%
1081
 
1.7%
1079
 
1.7%
1058
 
1.6%
1054
 
1.6%
Other values (37) 37342
58.2%
Common
ValueCountFrequency (%)
6564
31.6%
2903
14.0%
2903
14.0%
1307
 
6.3%
1059
 
5.1%
1048
 
5.0%
1014
 
4.9%
1007
 
4.8%
1002
 
4.8%
1000
 
4.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 84953
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
6564
 
7.7%
4347
 
5.1%
4313
 
5.1%
4283
 
5.0%
4246
 
5.0%
4244
 
5.0%
2903
 
3.4%
2903
 
3.4%
1307
 
1.5%
1110
 
1.3%
Other values (48) 48733
57.4%

voice_10.prompt
Categorical

IMBALANCE  MISSING 

Distinct3
Distinct (%)0.1%
Missing25212
Missing (%)88.5%
Memory size1.1 MiB
3244 
 
31
 
7

Length

Max length349
Median length34
Mean length34.898537
Min length34

Characters and Unicode

Total characters114537
Distinct characters29
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
3244
 
11.4%
31
 
0.1%
7
 
< 0.1%
(Missing) 25212
88.5%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
3282
16.2%
3275
16.2%
3275
16.2%
3244
16.0%
3244
16.0%
3244
16.0%
52
 
0.3%
38
 
0.2%
35
 
0.2%
31
 
0.2%
Other values (48) 516
 
2.5%

Most occurring characters

ValueCountFrequency (%)
16954
14.8%
10045
 
8.8%
10006
 
8.7%
9781
 
8.5%
6834
 
6.0%
6725
 
5.9%
6676
 
5.8%
6661
 
5.8%
6651
 
5.8%
6509
 
5.7%
Other values (19) 27695
24.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 90921
79.4%
Space Separator 16954
 
14.8%
Other Punctuation 3338
 
2.9%
Uppercase Letter 3324
 
2.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
10045
11.0%
10006
11.0%
9781
10.8%
6834
 
7.5%
6725
 
7.4%
6676
 
7.3%
6661
 
7.3%
6651
 
7.3%
6509
 
7.2%
3456
 
3.8%
Other values (12) 17577
19.3%
Other Punctuation
ValueCountFrequency (%)
3282
98.3%
28
 
0.8%
21
 
0.6%
7
 
0.2%
Uppercase Letter
ValueCountFrequency (%)
3275
98.5%
49
 
1.5%
Space Separator
ValueCountFrequency (%)
16954
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 94245
82.3%
Common 20292
 
17.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
10045
10.7%
10006
10.6%
9781
10.4%
6834
 
7.3%
6725
 
7.1%
6676
 
7.1%
6661
 
7.1%
6651
 
7.1%
6509
 
6.9%
3456
 
3.7%
Other values (14) 20901
22.2%
Common
ValueCountFrequency (%)
16954
83.6%
3282
 
16.2%
28
 
0.1%
21
 
0.1%
7
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 114537
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
16954
14.8%
10045
 
8.8%
10006
 
8.7%
9781
 
8.5%
6834
 
6.0%
6725
 
5.9%
6676
 
5.8%
6661
 
5.8%
6651
 
5.8%
6509
 
5.7%
Other values (19) 27695
24.2%

voice_2.GCSData
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing28494
Missing (%)100.0%
Memory size222.7 KiB
Distinct2919
Distinct (%)99.9%
Missing25572
Missing (%)89.7%
Memory size1.1 MiB
 
2
 
2
 
2
 
1
 
1
2914 
25572 
ValueCountFrequency (%)
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
Other values (2909) 2909
 
10.2%
(Missing) 25572
89.7%
ValueCountFrequency (%)
2922
 
10.3%
(Missing) 25572
89.7%
ValueCountFrequency (%)
2922
 
10.3%
(Missing) 25572
89.7%
ValueCountFrequency (%)
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
Other values (2909) 2909
 
10.2%
(Missing) 25572
89.7%
ValueCountFrequency (%)
2922
 
10.3%
(Missing) 25572
89.7%
ValueCountFrequency (%)
2922
 
10.3%
(Missing) 25572
89.7%

voice_2.fileName
Categorical

HIGH CARDINALITY  MISSING  UNIFORM 

Distinct2923
Distinct (%)99.9%
Missing25567
Missing (%)89.7%
Memory size1.0 MiB
 
2
 
2
 
2
 
2
 
1
2918 

Length

Max length25
Median length25
Mean length25
Min length25

Characters and Unicode

Total characters73175
Distinct characters58
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2919 ?
Unique (%)99.7%

Common Values

ValueCountFrequency (%)
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
Other values (2913) 2913
 
10.2%
(Missing) 25567
89.7%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
2
 
0.1%
2
 
0.1%
2
 
0.1%
2
 
0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
Other values (2913) 2913
99.5%

Most occurring characters

ValueCountFrequency (%)
5854
 
8.0%
3861
 
5.3%
3849
 
5.3%
3816
 
5.2%
3787
 
5.2%
3785
 
5.2%
2888
 
3.9%
998
 
1.4%
957
 
1.3%
948
 
1.3%
Other values (48) 42432
58.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 37218
50.9%
Uppercase Letter 19987
27.3%
Decimal Number 10116
 
13.8%
Connector Punctuation 5854
 
8.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
3861
 
10.4%
3849
 
10.3%
3816
 
10.3%
3787
 
10.2%
3785
 
10.2%
947
 
2.5%
946
 
2.5%
933
 
2.5%
929
 
2.5%
927
 
2.5%
Other values (15) 13438
36.1%
Uppercase Letter
ValueCountFrequency (%)
998
 
5.0%
957
 
4.8%
942
 
4.7%
931
 
4.7%
926
 
4.6%
926
 
4.6%
920
 
4.6%
919
 
4.6%
911
 
4.6%
907
 
4.5%
Other values (12) 10650
53.3%
Decimal Number
ValueCountFrequency (%)
2888
28.5%
948
 
9.4%
937
 
9.3%
915
 
9.0%
896
 
8.9%
889
 
8.8%
881
 
8.7%
879
 
8.7%
844
 
8.3%
39
 
0.4%
Connector Punctuation
ValueCountFrequency (%)
5854
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 57205
78.2%
Common 15970
 
21.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
3861
 
6.7%
3849
 
6.7%
3816
 
6.7%
3787
 
6.6%
3785
 
6.6%
998
 
1.7%
957
 
1.7%
947
 
1.7%
946
 
1.7%
942
 
1.6%
Other values (37) 33317
58.2%
Common
ValueCountFrequency (%)
5854
36.7%
2888
18.1%
948
 
5.9%
937
 
5.9%
915
 
5.7%
896
 
5.6%
889
 
5.6%
881
 
5.5%
879
 
5.5%
844
 
5.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 73175
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
5854
 
8.0%
3861
 
5.3%
3849
 
5.3%
3816
 
5.2%
3787
 
5.2%
3785
 
5.2%
2888
 
3.9%
998
 
1.4%
957
 
1.3%
948
 
1.3%
Other values (48) 42432
58.0%

voice_2.prompt
Categorical

IMBALANCE  MISSING 

Distinct2
Distinct (%)0.1%
Missing25567
Missing (%)89.7%
Memory size1.1 MiB
2888 
 
39

Length

Max length77
Median length57
Mean length57.266484
Min length57

Characters and Unicode

Total characters167619
Distinct characters30
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
2888
 
10.1%
39
 
0.1%
(Missing) 25567
89.7%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
5893
18.2%
2927
9.1%
2888
8.9%
2888
8.9%
2888
8.9%
2888
8.9%
2888
8.9%
2888
8.9%
2888
8.9%
2888
8.9%
Other values (10) 390
 
1.2%

Most occurring characters

ValueCountFrequency (%)
29387
17.5%
17523
 
10.5%
14674
 
8.8%
11591
 
6.9%
8937
 
5.3%
8859
 
5.3%
8781
 
5.2%
5932
 
3.5%
5932
 
3.5%
5854
 
3.5%
Other values (20) 50149
29.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 117665
70.2%
Space Separator 29387
 
17.5%
Uppercase Letter 11708
 
7.0%
Other Punctuation 8859
 
5.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
17523
14.9%
14674
12.5%
11591
9.9%
8937
 
7.6%
8859
 
7.5%
8781
 
7.5%
5932
 
5.0%
5932
 
5.0%
5854
 
5.0%
5776
 
4.9%
Other values (10) 23806
20.2%
Uppercase Letter
ValueCountFrequency (%)
5776
49.3%
2966
25.3%
2888
24.7%
39
 
0.3%
39
 
0.3%
Other Punctuation
ValueCountFrequency (%)
5815
65.6%
2927
33.0%
78
 
0.9%
39
 
0.4%
Space Separator
ValueCountFrequency (%)
29387
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 129373
77.2%
Common 38246
 
22.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
17523
13.5%
14674
11.3%
11591
 
9.0%
8937
 
6.9%
8859
 
6.8%
8781
 
6.8%
5932
 
4.6%
5932
 
4.6%
5854
 
4.5%
5776
 
4.5%
Other values (15) 35514
27.5%
Common
ValueCountFrequency (%)
29387
76.8%
5815
 
15.2%
2927
 
7.7%
78
 
0.2%
39
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 167619
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
29387
17.5%
17523
 
10.5%
14674
 
8.8%
11591
 
6.9%
8937
 
5.3%
8859
 
5.3%
8781
 
5.2%
5932
 
3.5%
5932
 
3.5%
5854
 
3.5%
Other values (20) 50149
29.9%

voice_3.GCSData
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing28494
Missing (%)100.0%
Memory size222.7 KiB
Distinct3053
Distinct (%)99.9%
Missing25438
Missing (%)89.3%
Memory size1.1 MiB
 
2
 
2
 
2
 
1
 
1
3048 
25438 
ValueCountFrequency (%)
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
Other values (3043) 3043
 
10.7%
(Missing) 25438
89.3%
ValueCountFrequency (%)
3056
 
10.7%
(Missing) 25438
89.3%
ValueCountFrequency (%)
3056
 
10.7%
(Missing) 25438
89.3%
ValueCountFrequency (%)
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
Other values (3043) 3043
 
10.7%
(Missing) 25438
89.3%
ValueCountFrequency (%)
3056
 
10.7%
(Missing) 25438
89.3%
ValueCountFrequency (%)
3056
 
10.7%
(Missing) 25438
89.3%

voice_3.fileName
Categorical

HIGH CARDINALITY  MISSING  UNIFORM 

Distinct3061
Distinct (%)99.9%
Missing25429
Missing (%)89.2%
Memory size1.0 MiB
 
2
 
2
 
2
 
2
 
1
3056 

Length

Max length25
Median length25
Mean length25
Min length25

Characters and Unicode

Total characters76625
Distinct characters57
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3057 ?
Unique (%)99.7%

Common Values

ValueCountFrequency (%)
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
Other values (3051) 3051
 
10.7%
(Missing) 25429
89.2%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
2
 
0.1%
2
 
0.1%
2
 
0.1%
2
 
0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
Other values (3051) 3051
99.5%

Most occurring characters

ValueCountFrequency (%)
6130
 
8.0%
4055
 
5.3%
4028
 
5.3%
3992
 
5.2%
3965
 
5.2%
3964
 
5.2%
2589
 
3.4%
2171
 
2.8%
1153
 
1.5%
1043
 
1.4%
Other values (47) 43535
56.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 38966
50.9%
Uppercase Letter 20940
27.3%
Decimal Number 10589
 
13.8%
Connector Punctuation 6130
 
8.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
4055
 
10.4%
4028
 
10.3%
3992
 
10.2%
3965
 
10.2%
3964
 
10.2%
1011
 
2.6%
992
 
2.5%
975
 
2.5%
970
 
2.5%
968
 
2.5%
Other values (15) 14046
36.0%
Uppercase Letter
ValueCountFrequency (%)
1043
 
5.0%
1003
 
4.8%
981
 
4.7%
980
 
4.7%
976
 
4.7%
972
 
4.6%
968
 
4.6%
951
 
4.5%
950
 
4.5%
950
 
4.5%
Other values (12) 11166
53.3%
Decimal Number
ValueCountFrequency (%)
2589
24.4%
2171
20.5%
1153
10.9%
991
 
9.4%
933
 
8.8%
931
 
8.8%
920
 
8.7%
900
 
8.5%
1
 
< 0.1%
Connector Punctuation
ValueCountFrequency (%)
6130
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 59906
78.2%
Common 16719
 
21.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
4055
 
6.8%
4028
 
6.7%
3992
 
6.7%
3965
 
6.6%
3964
 
6.6%
1043
 
1.7%
1011
 
1.7%
1003
 
1.7%
992
 
1.7%
981
 
1.6%
Other values (37) 34872
58.2%
Common
ValueCountFrequency (%)
6130
36.7%
2589
15.5%
2171
 
13.0%
1153
 
6.9%
991
 
5.9%
933
 
5.6%
931
 
5.6%
920
 
5.5%
900
 
5.4%
1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 76625
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
6130
 
8.0%
4055
 
5.3%
4028
 
5.3%
3992
 
5.2%
3965
 
5.2%
3964
 
5.2%
2589
 
3.4%
2171
 
2.8%
1153
 
1.5%
1043
 
1.4%
Other values (47) 43535
56.8%

voice_3.prompt
Categorical

IMBALANCE  MISSING 

Distinct6
Distinct (%)0.2%
Missing25429
Missing (%)89.2%
Memory size1.6 MiB
1672 
1351 
 
35
 
3
 
3

Length

Max length349
Median length135
Mean length211.09103
Min length135

Characters and Unicode

Total characters646994
Distinct characters40
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Common Values

ValueCountFrequency (%)
1672
 
5.9%
1351
 
4.7%
35
 
0.1%
3
 
< 0.1%
3
 
< 0.1%
1
 
< 0.1%
(Missing) 25429
89.2%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
11862
 
9.9%
8883
 
7.4%
4734
 
4.0%
4374
 
3.7%
4123
 
3.5%
3345
 
2.8%
3058
 
2.6%
3023
 
2.5%
2773
 
2.3%
2750
 
2.3%
Other values (146) 70316
59.0%

Most occurring characters

ValueCountFrequency (%)
116176
18.0%
54032
 
8.4%
51591
 
8.0%
48269
 
7.5%
39745
 
6.1%
35856
 
5.5%
29106
 
4.5%
27627
 
4.3%
26582
 
4.1%
26362
 
4.1%
Other values (30) 191648
29.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 507483
78.4%
Space Separator 116176
 
18.0%
Other Punctuation 11708
 
1.8%
Uppercase Letter 7574
 
1.2%
Decimal Number 4053
 
0.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
54032
 
10.6%
51591
 
10.2%
48269
 
9.5%
39745
 
7.8%
35856
 
7.1%
29106
 
5.7%
27627
 
5.4%
26582
 
5.2%
26362
 
5.2%
24962
 
4.9%
Other values (14) 143351
28.2%
Uppercase Letter
ValueCountFrequency (%)
2708
35.8%
1796
23.7%
1672
22.1%
1354
17.9%
38
 
0.5%
3
 
< 0.1%
3
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
5854
50.0%
2788
23.8%
1673
 
14.3%
1351
 
11.5%
42
 
0.4%
Decimal Number
ValueCountFrequency (%)
1351
33.3%
1351
33.3%
1351
33.3%
Space Separator
ValueCountFrequency (%)
116176
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 515057
79.6%
Common 131937
 
20.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
54032
 
10.5%
51591
 
10.0%
48269
 
9.4%
39745
 
7.7%
35856
 
7.0%
29106
 
5.7%
27627
 
5.4%
26582
 
5.2%
26362
 
5.1%
24962
 
4.8%
Other values (21) 150925
29.3%
Common
ValueCountFrequency (%)
116176
88.1%
5854
 
4.4%
2788
 
2.1%
1673
 
1.3%
1351
 
1.0%
1351
 
1.0%
1351
 
1.0%
1351
 
1.0%
42
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 646994
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
116176
18.0%
54032
 
8.4%
51591
 
8.0%
48269
 
7.5%
39745
 
6.1%
35856
 
5.5%
29106
 
4.5%
27627
 
4.3%
26582
 
4.1%
26362
 
4.1%
Other values (30) 191648
29.6%

voice_4.GCSData
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing28494
Missing (%)100.0%
Memory size222.7 KiB
Distinct3121
Distinct (%)99.9%
Missing25369
Missing (%)89.0%
Memory size1.1 MiB
 
2
 
2
 
2
 
2
 
1
3116 
25369 
ValueCountFrequency (%)
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
Other values (3111) 3111
 
10.9%
(Missing) 25369
89.0%
ValueCountFrequency (%)
3125
 
11.0%
(Missing) 25369
89.0%
ValueCountFrequency (%)
3125
 
11.0%
(Missing) 25369
89.0%
ValueCountFrequency (%)
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
Other values (3111) 3111
 
10.9%
(Missing) 25369
89.0%
ValueCountFrequency (%)
3125
 
11.0%
(Missing) 25369
89.0%
ValueCountFrequency (%)
3125
 
11.0%
(Missing) 25369
89.0%

voice_4.fileName
Categorical

HIGH CARDINALITY  MISSING  UNIFORM 

Distinct3139
Distinct (%)99.8%
Missing25350
Missing (%)89.0%
Memory size1.0 MiB
 
2
 
2
 
2
 
2
 
2
3134 

Length

Max length25
Median length25
Mean length25
Min length25

Characters and Unicode

Total characters78600
Distinct characters58
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3134 ?
Unique (%)99.7%

Common Values

ValueCountFrequency (%)
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
Other values (3129) 3129
 
11.0%
(Missing) 25350
89.0%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
2
 
0.1%
2
 
0.1%
2
 
0.1%
2
 
0.1%
2
 
0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
Other values (3129) 3129
99.5%

Most occurring characters

ValueCountFrequency (%)
6288
 
8.0%
4153
 
5.3%
4134
 
5.3%
4107
 
5.2%
4069
 
5.2%
4064
 
5.2%
2695
 
3.4%
2141
 
2.7%
1065
 
1.4%
1048
 
1.3%
Other values (48) 44836
57.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 40010
50.9%
Uppercase Letter 21446
27.3%
Decimal Number 10856
 
13.8%
Connector Punctuation 6288
 
8.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
4153
 
10.4%
4134
 
10.3%
4107
 
10.3%
4069
 
10.2%
4064
 
10.2%
1017
 
2.5%
1016
 
2.5%
1009
 
2.5%
1003
 
2.5%
998
 
2.5%
Other values (15) 14440
36.1%
Uppercase Letter
ValueCountFrequency (%)
1065
 
5.0%
1033
 
4.8%
994
 
4.6%
993
 
4.6%
992
 
4.6%
990
 
4.6%
984
 
4.6%
983
 
4.6%
983
 
4.6%
980
 
4.6%
Other values (12) 11449
53.4%
Decimal Number
ValueCountFrequency (%)
2695
24.8%
2141
19.7%
1048
 
9.7%
973
 
9.0%
963
 
8.9%
956
 
8.8%
946
 
8.7%
914
 
8.4%
216
 
2.0%
4
 
< 0.1%
Connector Punctuation
ValueCountFrequency (%)
6288
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 61456
78.2%
Common 17144
 
21.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
4153
 
6.8%
4134
 
6.7%
4107
 
6.7%
4069
 
6.6%
4064
 
6.6%
1065
 
1.7%
1033
 
1.7%
1017
 
1.7%
1016
 
1.7%
1009
 
1.6%
Other values (37) 35789
58.2%
Common
ValueCountFrequency (%)
6288
36.7%
2695
15.7%
2141
 
12.5%
1048
 
6.1%
973
 
5.7%
963
 
5.6%
956
 
5.6%
946
 
5.5%
914
 
5.3%
216
 
1.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 78600
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
6288
 
8.0%
4153
 
5.3%
4134
 
5.3%
4107
 
5.2%
4069
 
5.2%
4064
 
5.2%
2695
 
3.4%
2141
 
2.7%
1065
 
1.4%
1048
 
1.3%
Other values (48) 44836
57.0%

voice_4.prompt
Categorical

IMBALANCE  MISSING 

Distinct5
Distinct (%)0.2%
Missing25350
Missing (%)89.0%
Memory size1.5 MiB
1895 
1199 
 
42
 
7
 
1

Length

Max length348
Median length196
Mean length174.20356
Min length77

Characters and Unicode

Total characters547696
Distinct characters37
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Common Values

ValueCountFrequency (%)
1895
 
6.7%
1199
 
4.2%
42
 
0.1%
7
 
< 0.1%
1
 
< 0.1%
(Missing) 25350
89.0%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
8297
 
7.8%
7579
 
7.1%
5080
 
4.8%
4301
 
4.0%
3790
 
3.6%
3790
 
3.6%
3094
 
2.9%
3094
 
2.9%
3094
 
2.9%
2398
 
2.3%
Other values (102) 61998
58.2%

Most occurring characters

ValueCountFrequency (%)
103371
18.9%
53202
 
9.7%
45246
 
8.3%
37523
 
6.9%
33942
 
6.2%
25837
 
4.7%
25442
 
4.6%
23205
 
4.2%
22788
 
4.2%
21713
 
4.0%
Other values (27) 155427
28.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 420042
76.7%
Space Separator 103371
 
18.9%
Other Punctuation 14021
 
2.6%
Uppercase Letter 10262
 
1.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
53202
12.7%
45246
10.8%
37523
 
8.9%
33942
 
8.1%
25837
 
6.2%
25442
 
6.1%
23205
 
5.5%
22788
 
5.4%
21713
 
5.2%
20196
 
4.8%
Other values (14) 110948
26.4%
Uppercase Letter
ValueCountFrequency (%)
7025
68.5%
1896
 
18.5%
1199
 
11.7%
43
 
0.4%
42
 
0.4%
42
 
0.4%
8
 
0.1%
7
 
0.1%
Other Punctuation
ValueCountFrequency (%)
7062
50.4%
3802
27.1%
1951
 
13.9%
1206
 
8.6%
Space Separator
ValueCountFrequency (%)
103371
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 430304
78.6%
Common 117392
 
21.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
53202
12.4%
45246
 
10.5%
37523
 
8.7%
33942
 
7.9%
25837
 
6.0%
25442
 
5.9%
23205
 
5.4%
22788
 
5.3%
21713
 
5.0%
20196
 
4.7%
Other values (22) 121210
28.2%
Common
ValueCountFrequency (%)
103371
88.1%
7062
 
6.0%
3802
 
3.2%
1951
 
1.7%
1206
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 547696
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
103371
18.9%
53202
 
9.7%
45246
 
8.3%
37523
 
6.9%
33942
 
6.2%
25837
 
4.7%
25442
 
4.6%
23205
 
4.2%
22788
 
4.2%
21713
 
4.0%
Other values (27) 155427
28.4%

voice_5.GCSData
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing28494
Missing (%)100.0%
Memory size222.7 KiB
Distinct3052
Distinct (%)99.9%
Missing25439
Missing (%)89.3%
Memory size1.1 MiB
 
2
 
2
 
2
 
1
 
1
3047 
25439 
ValueCountFrequency (%)
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
Other values (3042) 3042
 
10.7%
(Missing) 25439
89.3%
ValueCountFrequency (%)
3055
 
10.7%
(Missing) 25439
89.3%
ValueCountFrequency (%)
3055
 
10.7%
(Missing) 25439
89.3%
ValueCountFrequency (%)
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
Other values (3042) 3042
 
10.7%
(Missing) 25439
89.3%
ValueCountFrequency (%)
3055
 
10.7%
(Missing) 25439
89.3%
ValueCountFrequency (%)
3055
 
10.7%
(Missing) 25439
89.3%

voice_5.fileName
Categorical

HIGH CARDINALITY  MISSING  UNIFORM 

Distinct3061
Distinct (%)99.9%
Missing25429
Missing (%)89.2%
Memory size1.0 MiB
 
2
 
2
 
2
 
2
 
1
3056 

Length

Max length25
Median length25
Mean length25
Min length25

Characters and Unicode

Total characters76625
Distinct characters56
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3057 ?
Unique (%)99.7%

Common Values

ValueCountFrequency (%)
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
Other values (3051) 3051
 
10.7%
(Missing) 25429
89.2%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
2
 
0.1%
2
 
0.1%
2
 
0.1%
2
 
0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
Other values (3051) 3051
99.5%

Most occurring characters

ValueCountFrequency (%)
6130
 
8.0%
4055
 
5.3%
4028
 
5.3%
3992
 
5.2%
3965
 
5.2%
3964
 
5.2%
2568
 
3.4%
2135
 
2.8%
1121
 
1.5%
1043
 
1.4%
Other values (46) 43624
56.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 38966
50.9%
Uppercase Letter 20940
27.3%
Decimal Number 10589
 
13.8%
Connector Punctuation 6130
 
8.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
4055
 
10.4%
4028
 
10.3%
3992
 
10.2%
3965
 
10.2%
3964
 
10.2%
1011
 
2.6%
992
 
2.5%
975
 
2.5%
970
 
2.5%
968
 
2.5%
Other values (15) 14046
36.0%
Uppercase Letter
ValueCountFrequency (%)
1043
 
5.0%
1003
 
4.8%
981
 
4.7%
980
 
4.7%
976
 
4.7%
972
 
4.6%
968
 
4.6%
951
 
4.5%
950
 
4.5%
950
 
4.5%
Other values (12) 11166
53.3%
Decimal Number
ValueCountFrequency (%)
2568
24.3%
2135
20.2%
1121
10.6%
995
 
9.4%
987
 
9.3%
933
 
8.8%
930
 
8.8%
920
 
8.7%
Connector Punctuation
ValueCountFrequency (%)
6130
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 59906
78.2%
Common 16719
 
21.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
4055
 
6.8%
4028
 
6.7%
3992
 
6.7%
3965
 
6.6%
3964
 
6.6%
1043
 
1.7%
1011
 
1.7%
1003
 
1.7%
992
 
1.7%
981
 
1.6%
Other values (37) 34872
58.2%
Common
ValueCountFrequency (%)
6130
36.7%
2568
15.4%
2135
 
12.8%
1121
 
6.7%
995
 
6.0%
987
 
5.9%
933
 
5.6%
930
 
5.6%
920
 
5.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 76625
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
6130
 
8.0%
4055
 
5.3%
4028
 
5.3%
3992
 
5.2%
3965
 
5.2%
3964
 
5.2%
2568
 
3.4%
2135
 
2.8%
1121
 
1.5%
1043
 
1.4%
Other values (46) 43624
56.9%

voice_5.prompt
Categorical

Distinct4
Distinct (%)0.1%
Missing25429
Missing (%)89.2%
Memory size2.0 MiB
1671 
1338 
 
43
 
13

Length

Max length349
Median length348
Mean length346.92985
Min length135

Characters and Unicode

Total characters1063340
Distinct characters37
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
1671
 
5.9%
1338
 
4.7%
43
 
0.2%
13
 
< 0.1%
(Missing) 25429
89.2%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
16845
 
8.9%
10896
 
5.7%
5685
 
3.0%
5352
 
2.8%
5267
 
2.8%
5013
 
2.6%
4766
 
2.5%
4433
 
2.3%
3385
 
1.8%
3342
 
1.8%
Other values (107) 124646
65.7%

Most occurring characters

ValueCountFrequency (%)
186565
17.5%
119968
11.3%
85506
 
8.0%
69527
 
6.5%
55877
 
5.3%
55076
 
5.2%
54225
 
5.1%
51487
 
4.8%
50627
 
4.8%
42750
 
4.0%
Other values (27) 291732
27.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 834747
78.5%
Space Separator 186565
 
17.5%
Other Punctuation 25694
 
2.4%
Uppercase Letter 16205
 
1.5%
Decimal Number 129
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
119968
14.4%
85506
 
10.2%
69527
 
8.3%
55877
 
6.7%
55076
 
6.6%
54225
 
6.5%
51487
 
6.2%
50627
 
6.1%
42750
 
5.1%
35059
 
4.2%
Other values (13) 214645
25.7%
Uppercase Letter
ValueCountFrequency (%)
11050
68.2%
1757
 
10.8%
1714
 
10.6%
1671
 
10.3%
13
 
0.1%
Other Punctuation
ValueCountFrequency (%)
9779
38.1%
9169
35.7%
5352
20.8%
1351
 
5.3%
43
 
0.2%
Decimal Number
ValueCountFrequency (%)
43
33.3%
43
33.3%
43
33.3%
Space Separator
ValueCountFrequency (%)
186565
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 850952
80.0%
Common 212388
 
20.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
119968
14.1%
85506
 
10.0%
69527
 
8.2%
55877
 
6.6%
55076
 
6.5%
54225
 
6.4%
51487
 
6.1%
50627
 
5.9%
42750
 
5.0%
35059
 
4.1%
Other values (18) 230850
27.1%
Common
ValueCountFrequency (%)
186565
87.8%
9779
 
4.6%
9169
 
4.3%
5352
 
2.5%
1351
 
0.6%
43
 
< 0.1%
43
 
< 0.1%
43
 
< 0.1%
43
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1063340
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
186565
17.5%
119968
11.3%
85506
 
8.0%
69527
 
6.5%
55877
 
5.3%
55076
 
5.2%
54225
 
5.1%
51487
 
4.8%
50627
 
4.8%
42750
 
4.0%
Other values (27) 291732
27.4%

voice_6.GCSData
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing28494
Missing (%)100.0%
Memory size222.7 KiB
Distinct3053
Distinct (%)99.9%
Missing25438
Missing (%)89.3%
Memory size1.1 MiB
 
2
 
2
 
2
 
1
 
1
3048 
25438 
ValueCountFrequency (%)
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
Other values (3043) 3043
 
10.7%
(Missing) 25438
89.3%
ValueCountFrequency (%)
3056
 
10.7%
(Missing) 25438
89.3%
ValueCountFrequency (%)
3056
 
10.7%
(Missing) 25438
89.3%
ValueCountFrequency (%)
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
Other values (3043) 3043
 
10.7%
(Missing) 25438
89.3%
ValueCountFrequency (%)
3056
 
10.7%
(Missing) 25438
89.3%
ValueCountFrequency (%)
3056
 
10.7%
(Missing) 25438
89.3%

voice_6.fileName
Categorical

HIGH CARDINALITY  MISSING  UNIFORM 

Distinct3061
Distinct (%)99.9%
Missing25429
Missing (%)89.2%
Memory size1.0 MiB
 
2
 
2
 
2
 
2
 
1
3056 

Length

Max length25
Median length25
Mean length25
Min length25

Characters and Unicode

Total characters76625
Distinct characters58
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3057 ?
Unique (%)99.7%

Common Values

ValueCountFrequency (%)
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
Other values (3051) 3051
 
10.7%
(Missing) 25429
89.2%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
2
 
0.1%
2
 
0.1%
2
 
0.1%
2
 
0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
Other values (3051) 3051
99.5%

Most occurring characters

ValueCountFrequency (%)
6130
 
8.0%
4055
 
5.3%
4028
 
5.3%
3992
 
5.2%
3965
 
5.2%
3964
 
5.2%
2663
 
3.5%
1201
 
1.6%
1043
 
1.4%
1011
 
1.3%
Other values (48) 44573
58.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 38966
50.9%
Uppercase Letter 20940
27.3%
Decimal Number 10589
 
13.8%
Connector Punctuation 6130
 
8.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
4055
 
10.4%
4028
 
10.3%
3992
 
10.2%
3965
 
10.2%
3964
 
10.2%
1011
 
2.6%
992
 
2.5%
975
 
2.5%
970
 
2.5%
968
 
2.5%
Other values (15) 14046
36.0%
Uppercase Letter
ValueCountFrequency (%)
1043
 
5.0%
1003
 
4.8%
981
 
4.7%
980
 
4.7%
976
 
4.7%
972
 
4.6%
968
 
4.6%
951
 
4.5%
950
 
4.5%
950
 
4.5%
Other values (12) 11166
53.3%
Decimal Number
ValueCountFrequency (%)
2663
25.1%
1201
11.3%
995
 
9.4%
960
 
9.1%
951
 
9.0%
933
 
8.8%
931
 
8.8%
920
 
8.7%
897
 
8.5%
138
 
1.3%
Connector Punctuation
ValueCountFrequency (%)
6130
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 59906
78.2%
Common 16719
 
21.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
4055
 
6.8%
4028
 
6.7%
3992
 
6.7%
3965
 
6.6%
3964
 
6.6%
1043
 
1.7%
1011
 
1.7%
1003
 
1.7%
992
 
1.7%
981
 
1.6%
Other values (37) 34872
58.2%
Common
ValueCountFrequency (%)
6130
36.7%
2663
15.9%
1201
 
7.2%
995
 
6.0%
960
 
5.7%
951
 
5.7%
933
 
5.6%
931
 
5.6%
920
 
5.5%
897
 
5.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 76625
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
6130
 
8.0%
4055
 
5.3%
4028
 
5.3%
3992
 
5.2%
3965
 
5.2%
3964
 
5.2%
2663
 
3.5%
1201
 
1.6%
1043
 
1.4%
1011
 
1.3%
Other values (48) 44573
58.2%

voice_6.prompt
Categorical

IMBALANCE  MISSING 

Distinct6
Distinct (%)0.2%
Missing25429
Missing (%)89.2%
Memory size1.6 MiB
1676 
1335 
 
34
 
14
 
4

Length

Max length322
Median length322
Mean length212.44209
Min length57

Characters and Unicode

Total characters651135
Distinct characters41
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
1676
 
5.9%
1335
 
4.7%
34
 
0.1%
14
 
< 0.1%
4
 
< 0.1%
2
 
< 0.1%
(Missing) 25429
89.2%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
10863
 
9.1%
8506
 
7.1%
6730
 
5.6%
5030
 
4.2%
5028
 
4.2%
4751
 
4.0%
3374
 
2.8%
3366
 
2.8%
3352
 
2.8%
3011
 
2.5%
Other values (109) 65205
54.7%

Most occurring characters

ValueCountFrequency (%)
116151
17.8%
69483
 
10.7%
59520
 
9.1%
44576
 
6.8%
40571
 
6.2%
30448
 
4.7%
29260
 
4.5%
27119
 
4.2%
26462
 
4.1%
22809
 
3.5%
Other values (31) 184736
28.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 504195
77.4%
Space Separator 116151
 
17.8%
Uppercase Letter 15540
 
2.4%
Other Punctuation 15237
 
2.3%
Decimal Number 12
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
69483
13.8%
59520
11.8%
44576
 
8.8%
40571
 
8.0%
30448
 
6.0%
29260
 
5.8%
27119
 
5.4%
26462
 
5.2%
22809
 
4.5%
21673
 
4.3%
Other values (14) 132274
26.2%
Uppercase Letter
ValueCountFrequency (%)
7776
50.0%
1686
 
10.8%
1676
 
10.8%
1676
 
10.8%
1339
 
8.6%
1335
 
8.6%
38
 
0.2%
14
 
0.1%
Other Punctuation
ValueCountFrequency (%)
8131
53.4%
4360
28.6%
1371
 
9.0%
1371
 
9.0%
4
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
4
33.3%
4
33.3%
4
33.3%
Space Separator
ValueCountFrequency (%)
116151
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 519735
79.8%
Common 131400
 
20.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
69483
13.4%
59520
11.5%
44576
 
8.6%
40571
 
7.8%
30448
 
5.9%
29260
 
5.6%
27119
 
5.2%
26462
 
5.1%
22809
 
4.4%
21673
 
4.2%
Other values (22) 147814
28.4%
Common
ValueCountFrequency (%)
116151
88.4%
8131
 
6.2%
4360
 
3.3%
1371
 
1.0%
1371
 
1.0%
4
 
< 0.1%
4
 
< 0.1%
4
 
< 0.1%
4
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 651135
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
116151
17.8%
69483
 
10.7%
59520
 
9.1%
44576
 
6.8%
40571
 
6.2%
30448
 
4.7%
29260
 
4.5%
27119
 
4.2%
26462
 
4.1%
22809
 
3.5%
Other values (31) 184736
28.4%

voice_7.GCSData
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing28494
Missing (%)100.0%
Memory size222.7 KiB
Distinct3253
Distinct (%)99.9%
Missing25237
Missing (%)88.6%
Memory size1.1 MiB
 
2
 
2
 
2
 
2
 
1
3248 
25237 
ValueCountFrequency (%)
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
Other values (3243) 3243
 
11.4%
(Missing) 25237
88.6%
ValueCountFrequency (%)
3257
 
11.4%
(Missing) 25237
88.6%
ValueCountFrequency (%)
3257
 
11.4%
(Missing) 25237
88.6%
ValueCountFrequency (%)
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
Other values (3243) 3243
 
11.4%
(Missing) 25237
88.6%
ValueCountFrequency (%)
3257
 
11.4%
(Missing) 25237
88.6%
ValueCountFrequency (%)
3257
 
11.4%
(Missing) 25237
88.6%

voice_7.fileName
Categorical

HIGH CARDINALITY  MISSING  UNIFORM 

Distinct3277
Distinct (%)99.8%
Missing25212
Missing (%)88.5%
Memory size1.0 MiB
 
2
 
2
 
2
 
2
 
2
3272 

Length

Max length25
Median length25
Mean length25
Min length25

Characters and Unicode

Total characters82050
Distinct characters58
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3272 ?
Unique (%)99.7%

Common Values

ValueCountFrequency (%)
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
Other values (3267) 3267
 
11.5%
(Missing) 25212
88.5%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
2
 
0.1%
2
 
0.1%
2
 
0.1%
2
 
0.1%
2
 
0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
Other values (3267) 3267
99.5%

Most occurring characters

ValueCountFrequency (%)
6564
 
8.0%
4347
 
5.3%
4313
 
5.3%
4283
 
5.2%
4246
 
5.2%
4244
 
5.2%
2668
 
3.3%
2253
 
2.7%
1266
 
1.5%
1110
 
1.4%
Other values (48) 46756
57.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 41758
50.9%
Uppercase Letter 22399
27.3%
Decimal Number 11329
 
13.8%
Connector Punctuation 6564
 
8.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
4347
 
10.4%
4313
 
10.3%
4283
 
10.3%
4246
 
10.2%
4244
 
10.2%
1081
 
2.6%
1058
 
2.5%
1054
 
2.5%
1054
 
2.5%
1038
 
2.5%
Other values (15) 15040
36.0%
Uppercase Letter
ValueCountFrequency (%)
1110
 
5.0%
1079
 
4.8%
1040
 
4.6%
1038
 
4.6%
1032
 
4.6%
1032
 
4.6%
1029
 
4.6%
1027
 
4.6%
1025
 
4.6%
1023
 
4.6%
Other values (12) 11964
53.4%
Decimal Number
ValueCountFrequency (%)
2668
23.6%
2253
19.9%
1266
11.2%
1019
 
9.0%
1014
 
9.0%
1000
 
8.8%
987
 
8.7%
981
 
8.7%
140
 
1.2%
1
 
< 0.1%
Connector Punctuation
ValueCountFrequency (%)
6564
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 64157
78.2%
Common 17893
 
21.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
4347
 
6.8%
4313
 
6.7%
4283
 
6.7%
4246
 
6.6%
4244
 
6.6%
1110
 
1.7%
1081
 
1.7%
1079
 
1.7%
1058
 
1.6%
1054
 
1.6%
Other values (37) 37342
58.2%
Common
ValueCountFrequency (%)
6564
36.7%
2668
14.9%
2253
 
12.6%
1266
 
7.1%
1019
 
5.7%
1014
 
5.7%
1000
 
5.6%
987
 
5.5%
981
 
5.5%
140
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 82050
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
6564
 
8.0%
4347
 
5.3%
4313
 
5.3%
4283
 
5.2%
4246
 
5.2%
4244
 
5.2%
2668
 
3.3%
2253
 
2.7%
1266
 
1.5%
1110
 
1.4%
Other values (48) 46756
57.0%

voice_7.prompt
Categorical

IMBALANCE  MISSING 

Distinct7
Distinct (%)0.2%
Missing25212
Missing (%)88.5%
Memory size1.8 MiB
1879 
1336 
 
53
 
7
 
3
 
4

Length

Max length349
Median length349
Mean length286.156
Min length77

Characters and Unicode

Total characters939164
Distinct characters41
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Common Values

ValueCountFrequency (%)
1879
 
6.6%
1336
 
4.7%
53
 
0.2%
7
 
< 0.1%
3
 
< 0.1%
3
 
< 0.1%
1
 
< 0.1%
(Missing) 25212
88.5%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
11074
 
6.2%
9896
 
5.5%
7520
 
4.2%
6438
 
3.6%
5692
 
3.2%
5161
 
2.9%
4565
 
2.5%
3758
 
2.1%
3758
 
2.1%
3758
 
2.1%
Other values (151) 118193
65.7%

Most occurring characters

ValueCountFrequency (%)
176531
18.8%
81827
 
8.7%
73696
 
7.8%
71175
 
7.6%
51996
 
5.5%
46379
 
4.9%
44155
 
4.7%
43764
 
4.7%
42029
 
4.5%
40534
 
4.3%
Other values (31) 267078
28.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 718439
76.5%
Space Separator 176531
 
18.8%
Other Punctuation 25419
 
2.7%
Uppercase Letter 18754
 
2.0%
Decimal Number 21
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
81827
11.4%
73696
 
10.3%
71175
 
9.9%
51996
 
7.2%
46379
 
6.5%
44155
 
6.1%
43764
 
6.1%
42029
 
5.9%
40534
 
5.6%
27629
 
3.8%
Other values (14) 195255
27.2%
Uppercase Letter
ValueCountFrequency (%)
17226
91.9%
1389
 
7.4%
68
 
0.4%
63
 
0.3%
3
 
< 0.1%
3
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
9835
38.7%
8859
34.9%
4833
19.0%
1885
 
7.4%
7
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
7
33.3%
7
33.3%
7
33.3%
Space Separator
ValueCountFrequency (%)
176531
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 737193
78.5%
Common 201971
 
21.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
81827
 
11.1%
73696
 
10.0%
71175
 
9.7%
51996
 
7.1%
46379
 
6.3%
44155
 
6.0%
43764
 
5.9%
42029
 
5.7%
40534
 
5.5%
27629
 
3.7%
Other values (22) 214009
29.0%
Common
ValueCountFrequency (%)
176531
87.4%
9835
 
4.9%
8859
 
4.4%
4833
 
2.4%
1885
 
0.9%
7
 
< 0.1%
7
 
< 0.1%
7
 
< 0.1%
7
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 939164
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
176531
18.8%
81827
 
8.7%
73696
 
7.8%
71175
 
7.6%
51996
 
5.5%
46379
 
4.9%
44155
 
4.7%
43764
 
4.7%
42029
 
4.5%
40534
 
4.3%
Other values (31) 267078
28.4%

voice_8.GCSData
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing28494
Missing (%)100.0%
Memory size222.7 KiB
Distinct3121
Distinct (%)99.9%
Missing25369
Missing (%)89.0%
Memory size1.1 MiB
 
2
 
2
 
2
 
2
 
1
3116 
25369 
ValueCountFrequency (%)
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
Other values (3111) 3111
 
10.9%
(Missing) 25369
89.0%
ValueCountFrequency (%)
3125
 
11.0%
(Missing) 25369
89.0%
ValueCountFrequency (%)
3125
 
11.0%
(Missing) 25369
89.0%
ValueCountFrequency (%)
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
Other values (3111) 3111
 
10.9%
(Missing) 25369
89.0%
ValueCountFrequency (%)
3125
 
11.0%
(Missing) 25369
89.0%
ValueCountFrequency (%)
3125
 
11.0%
(Missing) 25369
89.0%

voice_8.fileName
Categorical

HIGH CARDINALITY  MISSING  UNIFORM 

Distinct3139
Distinct (%)99.8%
Missing25350
Missing (%)89.0%
Memory size1.0 MiB
 
2
 
2
 
2
 
2
 
2
3134 

Length

Max length25
Median length25
Mean length25
Min length25

Characters and Unicode

Total characters78600
Distinct characters58
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3134 ?
Unique (%)99.7%

Common Values

ValueCountFrequency (%)
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
Other values (3129) 3129
 
11.0%
(Missing) 25350
89.0%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
2
 
0.1%
2
 
0.1%
2
 
0.1%
2
 
0.1%
2
 
0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
Other values (3129) 3129
99.5%

Most occurring characters

ValueCountFrequency (%)
6288
 
8.0%
4153
 
5.3%
4134
 
5.3%
4107
 
5.2%
4069
 
5.2%
4064
 
5.2%
2110
 
2.7%
1687
 
2.1%
1065
 
1.4%
1033
 
1.3%
Other values (48) 45890
58.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 40010
50.9%
Uppercase Letter 21446
27.3%
Decimal Number 10856
 
13.8%
Connector Punctuation 6288
 
8.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
4153
 
10.4%
4134
 
10.3%
4107
 
10.3%
4069
 
10.2%
4064
 
10.2%
1017
 
2.5%
1016
 
2.5%
1009
 
2.5%
1003
 
2.5%
998
 
2.5%
Other values (15) 14440
36.1%
Uppercase Letter
ValueCountFrequency (%)
1065
 
5.0%
1033
 
4.8%
994
 
4.6%
993
 
4.6%
992
 
4.6%
990
 
4.6%
984
 
4.6%
983
 
4.6%
983
 
4.6%
980
 
4.6%
Other values (12) 11449
53.4%
Decimal Number
ValueCountFrequency (%)
2110
19.4%
1687
15.5%
1016
9.4%
1006
9.3%
999
9.2%
973
9.0%
956
8.8%
946
8.7%
944
8.7%
219
 
2.0%
Connector Punctuation
ValueCountFrequency (%)
6288
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 61456
78.2%
Common 17144
 
21.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
4153
 
6.8%
4134
 
6.7%
4107
 
6.7%
4069
 
6.6%
4064
 
6.6%
1065
 
1.7%
1033
 
1.7%
1017
 
1.7%
1016
 
1.7%
1009
 
1.6%
Other values (37) 35789
58.2%
Common
ValueCountFrequency (%)
6288
36.7%
2110
 
12.3%
1687
 
9.8%
1016
 
5.9%
1006
 
5.9%
999
 
5.8%
973
 
5.7%
956
 
5.6%
946
 
5.5%
944
 
5.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 78600
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
6288
 
8.0%
4153
 
5.3%
4134
 
5.3%
4107
 
5.2%
4069
 
5.2%
4064
 
5.2%
2110
 
2.7%
1687
 
2.1%
1065
 
1.4%
1033
 
1.3%
Other values (48) 45890
58.4%

voice_8.prompt
Categorical

IMBALANCE  MISSING 

Distinct6
Distinct (%)0.2%
Missing25350
Missing (%)89.0%
Memory size1.5 MiB
1902 
1197 
 
36
 
4
 
3

Length

Max length349
Median length77
Mean length183.46024
Min length57

Characters and Unicode

Total characters576799
Distinct characters34
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Common Values

ValueCountFrequency (%)
1902
 
6.7%
1197
 
4.2%
36
 
0.1%
4
 
< 0.1%
3
 
< 0.1%
2
 
< 0.1%
(Missing) 25350
89.0%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
9317
 
9.3%
7366
 
7.4%
4912
 
4.9%
3591
 
3.6%
3143
 
3.1%
2434
 
2.4%
2394
 
2.4%
2394
 
2.4%
2394
 
2.4%
2394
 
2.4%
Other values (111) 59527
59.6%

Most occurring characters

ValueCountFrequency (%)
96722
16.8%
65038
 
11.3%
43371
 
7.5%
35667
 
6.2%
32910
 
5.7%
31255
 
5.4%
30638
 
5.3%
24704
 
4.3%
23648
 
4.1%
23027
 
4.0%
Other values (24) 169819
29.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 447950
77.7%
Space Separator 96722
 
16.8%
Other Punctuation 19447
 
3.4%
Uppercase Letter 12680
 
2.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
65038
14.5%
43371
 
9.7%
35667
 
8.0%
32910
 
7.3%
31255
 
7.0%
30638
 
6.8%
24704
 
5.5%
23648
 
5.3%
23027
 
5.1%
22962
 
5.1%
Other values (13) 114730
25.6%
Uppercase Letter
ValueCountFrequency (%)
5270
41.6%
3099
24.4%
1902
 
15.0%
1201
 
9.5%
1200
 
9.5%
8
 
0.1%
Other Punctuation
ValueCountFrequency (%)
7931
40.8%
5621
28.9%
3952
20.3%
1943
 
10.0%
Space Separator
ValueCountFrequency (%)
96722
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 460630
79.9%
Common 116169
 
20.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
65038
14.1%
43371
 
9.4%
35667
 
7.7%
32910
 
7.1%
31255
 
6.8%
30638
 
6.7%
24704
 
5.4%
23648
 
5.1%
23027
 
5.0%
22962
 
5.0%
Other values (19) 127410
27.7%
Common
ValueCountFrequency (%)
96722
83.3%
7931
 
6.8%
5621
 
4.8%
3952
 
3.4%
1943
 
1.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 576799
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
96722
16.8%
65038
 
11.3%
43371
 
7.5%
35667
 
6.2%
32910
 
5.7%
31255
 
5.4%
30638
 
5.3%
24704
 
4.3%
23648
 
4.1%
23027
 
4.0%
Other values (24) 169819
29.4%
Distinct3254
Distinct (%)99.9%
Missing25236
Missing (%)88.6%
Memory size1.1 MiB
 
2
 
2
 
2
 
2
 
1
3249 
25236 
ValueCountFrequency (%)
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
Other values (3244) 3244
 
11.4%
(Missing) 25236
88.6%
ValueCountFrequency (%)
3258
 
11.4%
(Missing) 25236
88.6%
ValueCountFrequency (%)
3258
 
11.4%
(Missing) 25236
88.6%
ValueCountFrequency (%)
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
Other values (3244) 3244
 
11.4%
(Missing) 25236
88.6%
ValueCountFrequency (%)
3258
 
11.4%
(Missing) 25236
88.6%
ValueCountFrequency (%)
3258
 
11.4%
(Missing) 25236
88.6%

voice_9.fileName
Categorical

HIGH CARDINALITY  MISSING  UNIFORM 

Distinct3277
Distinct (%)99.8%
Missing25212
Missing (%)88.5%
Memory size1.0 MiB
 
2
 
2
 
2
 
2
 
2
3272 

Length

Max length26
Median length25
Mean length25.001828
Min length25

Characters and Unicode

Total characters82056
Distinct characters58
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3272 ?
Unique (%)99.7%

Common Values

ValueCountFrequency (%)
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
2
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
Other values (3267) 3267
 
11.5%
(Missing) 25212
88.5%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
2
 
0.1%
2
 
0.1%
2
 
0.1%
2
 
0.1%
2
 
0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
Other values (3267) 3267
99.5%

Most occurring characters

ValueCountFrequency (%)
6564
 
8.0%
4347
 
5.3%
4313
 
5.3%
4283
 
5.2%
4246
 
5.2%
4244
 
5.2%
3894
 
4.7%
1324
 
1.6%
1110
 
1.4%
1081
 
1.3%
Other values (48) 46650
56.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 41758
50.9%
Uppercase Letter 22399
27.3%
Decimal Number 11335
 
13.8%
Connector Punctuation 6564
 
8.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
4347
 
10.4%
4313
 
10.3%
4283
 
10.3%
4246
 
10.2%
4244
 
10.2%
1081
 
2.6%
1058
 
2.5%
1054
 
2.5%
1054
 
2.5%
1038
 
2.5%
Other values (15) 15040
36.0%
Uppercase Letter
ValueCountFrequency (%)
1110
 
5.0%
1079
 
4.8%
1040
 
4.6%
1038
 
4.6%
1032
 
4.6%
1032
 
4.6%
1029
 
4.6%
1027
 
4.6%
1025
 
4.6%
1023
 
4.6%
Other values (12) 11964
53.4%
Decimal Number
ValueCountFrequency (%)
3894
34.4%
1324
 
11.7%
1062
 
9.4%
1051
 
9.3%
1014
 
8.9%
1008
 
8.9%
1000
 
8.8%
970
 
8.6%
6
 
0.1%
6
 
0.1%
Connector Punctuation
ValueCountFrequency (%)
6564
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 64157
78.2%
Common 17899
 
21.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
4347
 
6.8%
4313
 
6.7%
4283
 
6.7%
4246
 
6.6%
4244
 
6.6%
1110
 
1.7%
1081
 
1.7%
1079
 
1.7%
1058
 
1.6%
1054
 
1.6%
Other values (37) 37342
58.2%
Common
ValueCountFrequency (%)
6564
36.7%
3894
21.8%
1324
 
7.4%
1062
 
5.9%
1051
 
5.9%
1014
 
5.7%
1008
 
5.6%
1000
 
5.6%
970
 
5.4%
6
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 82056
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
6564
 
8.0%
4347
 
5.3%
4313
 
5.3%
4283
 
5.2%
4246
 
5.2%
4244
 
5.2%
3894
 
4.7%
1324
 
1.6%
1110
 
1.4%
1081
 
1.3%
Other values (48) 46650
56.9%

voice_9.prompt
Categorical

IMBALANCE  MISSING 

Distinct6
Distinct (%)0.2%
Missing25212
Missing (%)88.5%
Memory size1.1 MiB
3252 
 
16
 
7
 
3
 
3

Length

Max length349
Median length58
Mean length59.881779
Min length34

Characters and Unicode

Total characters196532
Distinct characters38
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Common Values

ValueCountFrequency (%)
3252
 
11.4%
16
 
0.1%
7
 
< 0.1%
3
 
< 0.1%
3
 
< 0.1%
1
 
< 0.1%
(Missing) 25212
88.5%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
3303
9.7%
3275
9.6%
3268
9.6%
3260
9.6%
3259
9.6%
3258
9.6%
3252
9.6%
3252
9.6%
3252
9.6%
3252
9.6%
Other values (114) 1357
4.0%

Most occurring characters

ValueCountFrequency (%)
30706
15.6%
26340
13.4%
20224
10.3%
16682
8.5%
13613
 
6.9%
13223
 
6.7%
10133
 
5.2%
10012
 
5.1%
10010
 
5.1%
7051
 
3.6%
Other values (28) 38538
19.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 158976
80.9%
Space Separator 30706
 
15.6%
Other Punctuation 3447
 
1.8%
Uppercase Letter 3394
 
1.7%
Decimal Number 9
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
26340
16.6%
20224
12.7%
16682
10.5%
13613
8.6%
13223
8.3%
10133
 
6.4%
10012
 
6.3%
10010
 
6.3%
7051
 
4.4%
6901
 
4.3%
Other values (13) 24787
15.6%
Uppercase Letter
ValueCountFrequency (%)
3259
96.0%
116
 
3.4%
9
 
0.3%
6
 
0.2%
3
 
0.1%
1
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
3276
95.0%
67
 
1.9%
64
 
1.9%
37
 
1.1%
3
 
0.1%
Decimal Number
ValueCountFrequency (%)
3
33.3%
3
33.3%
3
33.3%
Space Separator
ValueCountFrequency (%)
30706
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 162370
82.6%
Common 34162
 
17.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
26340
16.2%
20224
12.5%
16682
10.3%
13613
8.4%
13223
8.1%
10133
 
6.2%
10012
 
6.2%
10010
 
6.2%
7051
 
4.3%
6901
 
4.3%
Other values (19) 28181
17.4%
Common
ValueCountFrequency (%)
30706
89.9%
3276
 
9.6%
67
 
0.2%
64
 
0.2%
37
 
0.1%
3
 
< 0.1%
3
 
< 0.1%
3
 
< 0.1%
3
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 196532
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
30706
15.6%
26340
13.4%
20224
10.3%
16682
8.5%
13613
 
6.9%
13223
 
6.7%
10133
 
5.2%
10012
 
5.1%
10010
 
5.1%
7051
 
3.6%
Other values (28) 38538
19.6%

Interactions

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.